Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Viktor Malik	565c35b3f1	module, bpf: Store BTF base pointer in struct module JIRA: https://issues.redhat.com/browse/RHEL-30774 commit d4e48e3dd45017abdd69a19285d197de897ef44f Author: Alan Maguire <alan.maguire@oracle.com> Date: Thu Jun 20 10:17:29 2024 +0100 module, bpf: Store BTF base pointer in struct module ...as this will allow split BTF modules with a base BTF representation (rather than the full vmlinux BTF at time of BTF encoding) to resolve their references to kernel types in a way that is more resilient to small changes in kernel types. This will allow modules that are not built every time the kernel is to provide more resilient BTF, rather than have it invalidated every time BTF ids for core kernel types change. Fields are ordered to avoid holes in struct module. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240620091733.1967885-3-alan.maguire@oracle.com Signed-off-by: Viktor Malik <vmalik@redhat.com>	2024-11-26 15:55:10 +01:00
Nico Pache	d6b2c538d9	kunit: add KUNIT_INIT_TABLE to init linker section commit d81f0d7b8b23ec79f80be602ed6129ded27862e8 Author: Rae Moar <rmoar@google.com> Date: Wed Dec 13 19:44:17 2023 +0000 kunit: add KUNIT_INIT_TABLE to init linker section Add KUNIT_INIT_TABLE to the INIT_DATA linker section. Alter the KUnit macros to create init tests: kunit_test_init_section_suites Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and KUNIT_INIT_TABLE. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> JIRA: https://issues.redhat.com/browse/RHEL-39303 Signed-off-by: Nico Pache <npache@redhat.com>	2024-07-31 20:32:28 -06:00
Donald Dutile	aaaa438fc7	modules: wait do_free_init correctly JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 8f8cd6c0a43ed637e620bbe45a8d0e0c2f4d5130 Author: Changbin Du <changbin.du@huawei.com> Date: Tue Feb 27 10:35:46 2024 +0800 modules: wait do_free_init correctly The synchronization here is to ensure the ordering of freeing of a module init so that it happens before W+X checking. It is worth noting it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. Commit `1a7b7d9220` ("modules: Use vmalloc special flag") moved calling do_free_init() into a global workqueue instead of relying on it being called through call_rcu(..., do_free_init), which used to allowed us call do_free_init() asynchronously after the end of a subsequent grace period. The move to a global workqueue broke the gaurantees for code which needed to be sure the do_free_init() would complete with rcu_barrier(). To fix this callers which used to rely on rcu_barrier() must now instead use flush_work(&init_free_wq). Without this fix, we still could encounter false positive reports in W+X checking since the rcu_barrier() here can not ensure the ordering now. Even worse, the rcu_barrier() can introduce significant delay. Eric Chanudet reported that the rcu_barrier introduces ~0.1s delay on a PREEMPT_RT kernel. [ 0.291444] Freeing unused kernel memory: 5568K [ 0.402442] Run /sbin/init as init process With this fix, the above delay can be eliminated. Link: https://lkml.kernel.org/r/20240227023546.2490667-1-changbin.du@huawei.com Fixes: `1a7b7d9220` ("modules: Use vmalloc special flag") Signed-off-by: Changbin Du <changbin.du@huawei.com> Tested-by: Eric Chanudet <echanude@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Xiaoyi Su <suxiaoyi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:30 -04:00
Donald Dutile	7aa2fa676f	Subject: revert of revert KEYS: Make use of platform keyring for module signature verify Put back the RHEL-only module-signing patch so distinguishable in RHEL after move of kernel/module-signing.c to kernel/module/signing.c . JIRA: https://issues.redhat.com/browse/RHEL-28063 Upstream Status: RHEL-only Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:30 -04:00
Donald Dutile	e6f4187276	module: Remove redundant TASK_UNINTERRUPTIBLE JIRA: https://issues.redhat.com/browse/RHEL-28063 commit f17f2c13d613cbeef529b03ca17ae2581b2e6cb8 Author: Kevin Hao <haokexin@gmail.com> Date: Fri Dec 8 16:29:34 2023 +0800 module: Remove redundant TASK_UNINTERRUPTIBLE TASK_KILLABLE already includes TASK_UNINTERRUPTIBLE, so there is no need to add a separate TASK_UNINTERRUPTIBLE. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	573fa8ea71	module/decompress: use kvmalloc() consistently JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 17fc8084aa8f9d5235f252fc3978db657dd77e92 Author: Andrea Righi <andrea.righi@canonical.com> Date: Thu Nov 2 09:19:14 2023 +0100 module/decompress: use kvmalloc() consistently We consistently switched from kmalloc() to vmalloc() in module decompression to prevent potential memory allocation failures with large modules, however vmalloc() is not as memory-efficient and fast as kmalloc(). Since we don't know in general the size of the workspace required by the decompression algorithm, it is more reasonable to use kvmalloc() consistently, also considering that we don't have special memory requirements here. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	5806df5e42	module: Annotate struct module_notes_attrs with __counted_by JIRA: https://issues.redhat.com/browse/RHEL-28063 commit ea0b0bcef4917a2640ecc100c768b8e785784834 Author: Kees Cook <keescook@chromium.org> Date: Fri Sep 22 10:52:53 2023 -0700 module: Annotate struct module_notes_attrs with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct module_notes_attrs. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	4f912b873e	module: Fix comment typo JIRA: https://issues.redhat.com/browse/RHEL-28063 commit fd06da776130ec2611c30272a0868f6a54cdf9d2 Author: Zhu Mao <zhumao001@208suo.com> Date: Wed Sep 20 17:13:09 2023 -0700 module: Fix comment typo Delete duplicated word in comment. Signed-off-by: Zhu Mao <zhumao001@208suo.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	64d79afe53	module/decompress: use vmalloc() for gzip decompression workspace JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 3737df782c740b944912ed93420c57344b1cf864 Author: Andrea Righi <andrea.righi@canonical.com> Date: Wed Aug 30 17:58:20 2023 +0200 module/decompress: use vmalloc() for gzip decompression workspace Use a similar approach as commit a419beac4a07 ("module/decompress: use vmalloc() for zstd decompression workspace") and replace kmalloc() with vmalloc() also for the gzip module decompression workspace. In this case the workspace is represented by struct inflate_workspace that can be fairly large for kmalloc() and it can potentially lead to allocation errors on certain systems: $ pahole inflate_workspace struct inflate_workspace { struct inflate_state inflate_state; /* 0 9544 / / --- cacheline 149 boundary (9536 bytes) was 8 bytes ago --- / unsigned char working_window[32768]; / 9544 32768 / / size: 42312, cachelines: 662, members: 2 / / last cacheline: 8 bytes */ }; Considering that there is no need to use continuous physical memory, simply switch to vmalloc() to provide a more reliable in-kernel module decompression. Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	ca8a2f786d	module/decompress: use vmalloc() for zstd decompression workspace JIRA: https://issues.redhat.com/browse/RHEL-28063 commit a419beac4a070aff63c520f36ebf7cb8a76a8ae5 Author: Andrea Righi <andrea.righi@canonical.com> Date: Tue Aug 29 14:05:08 2023 +0200 module/decompress: use vmalloc() for zstd decompression workspace Using kmalloc() to allocate the decompression workspace for zstd may trigger the following warning when large modules are loaded (i.e., xfs): [ 2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350 ... [ 2.989033] Call Trace: [ 2.989841] <TASK> [ 2.990614] ? show_regs+0x6d/0x80 [ 2.991573] ? __warn+0x89/0x160 [ 2.992485] ? __alloc_pages+0x2c3/0x350 [ 2.993520] ? report_bug+0x17e/0x1b0 [ 2.994506] ? handle_bug+0x51/0xa0 [ 2.995474] ? exc_invalid_op+0x18/0x80 [ 2.996469] ? asm_exc_invalid_op+0x1b/0x20 [ 2.997530] ? module_zstd_decompress+0xdc/0x2a0 [ 2.998665] ? __alloc_pages+0x2c3/0x350 [ 2.999695] ? module_zstd_decompress+0xdc/0x2a0 [ 3.000821] __kmalloc_large_node+0x7a/0x150 [ 3.001920] __kmalloc+0xdb/0x170 [ 3.002824] module_zstd_decompress+0xdc/0x2a0 [ 3.003857] module_decompress+0x37/0xc0 [ 3.004688] init_module_from_file+0xd0/0x100 [ 3.005668] idempotent_init_module+0x11c/0x2b0 [ 3.006632] __x64_sys_finit_module+0x64/0xd0 [ 3.007568] do_syscall_64+0x59/0x90 [ 3.008373] ? ksys_read+0x73/0x100 [ 3.009395] ? exit_to_user_mode_prepare+0x30/0xb0 [ 3.010531] ? syscall_exit_to_user_mode+0x37/0x60 [ 3.011662] ? do_syscall_64+0x68/0x90 [ 3.012511] ? do_syscall_64+0x68/0x90 [ 3.013364] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 However, continuous physical memory does not seem to be required in module_zstd_decompress(), so use vmalloc() instead, to prevent the warning and avoid potential failures at loading compressed modules. Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	b17d15cce3	module: Expose module_init_layout_section() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 2abcc4b5a64a65a2d2287ba0be5c2871c1552416 Author: James Morse <james.morse@arm.com> Date: Tue Aug 1 14:54:07 2023 +0000 module: Expose module_init_layout_section() module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when module unloading is disabled. This code will never run, so it can be free()d once the module has been initialised. arm and arm64 need to count the number of PLTs they need before applying relocations based on the section name. The init PLTs are stored separately so they can be free()d. arm and arm64 both use within_module_init() to decide which list of PLTs to use when applying the relocation. Because within_module_init()'s behaviour changes when module unloading is disabled, both architecture would need to take this into account when counting the PLTs. Today neither architecture does this, meaning when module unloading is disabled there are insufficient PLTs in the init section to load some modules, resulting in warnings: \| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc \| Modules linked in: crct10dif_common \| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 \| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 \| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : module_emit_plt_entry+0x184/0x1cc \| lr : module_emit_plt_entry+0x94/0x1cc \| sp : ffffffc0803bba60 [...] \| Call trace: \| module_emit_plt_entry+0x184/0x1cc \| apply_relocate_add+0x2bc/0x8e4 \| load_module+0xe34/0x1bd4 \| init_module_from_file+0x84/0xc0 \| __arm64_sys_finit_module+0x1b8/0x27c \| invoke_syscall.constprop.0+0x5c/0x104 \| do_el0_svc+0x58/0x160 \| el0_svc+0x38/0x110 \| el0t_64_sync_handler+0xc0/0xc4 \| el0t_64_sync+0x190/0x194 Instead of duplicating module_init_layout_section()s logic, expose it. Reported-by: Adam Johnston <adam.johnston@arm.com> Fixes: `055f23b74b` ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:29 -04:00
Donald Dutile	cf7679e379	modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols JIRA: https://issues.redhat.com/browse/RHEL-28063 commit ff09f6fd297293175eaa0ed492495e36b3eb1a8e Author: Palmer Dabbelt <palmer@rivosinc.com> Date: Fri Jul 21 08:01:48 2023 -0700 modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols Trying to restrict the '$'-prefix change to RISC-V caused some fallout, so let's just treat all those symbols as special. Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too") Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	346c4d39ff	module: Ignore RISC-V mapping symbols too JIRA: https://issues.redhat.com/browse/RHEL-28063 commit c05780ef3c190c2dafbf0be8e65d4f01103ad577 Author: Palmer Dabbelt <palmer@rivosinc.com> Date: Fri Jul 7 09:00:51 2023 -0700 module: Ignore RISC-V mapping symbols too RISC-V has an extended form of mapping symbols that we use to encode the ISA when it changes in the middle of an ELF. This trips up modpost as a build failure, I haven't yet verified it yet but I believe the kallsyms difference should result in stacks looking sane again. Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/9d9e2902-5489-4bf0-d9cb-556c8e5d71c2@infradead.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	70f40ed8e2	module: fix init_module_from_file() error handling JIRA: https://issues.redhat.com/browse/RHEL-28063 commit f1962207150c8b602e980616f04b37ea4e64bb9f Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue Jul 4 06:37:32 2023 -0700 module: fix init_module_from_file() error handling Vegard Nossum pointed out two different problems with the error handling in init_module_from_file(): (a) the idempotent loading code didn't clean up properly in some error cases, leaving the on-stack 'struct idempotent' element still in the hash table (b) failure to read the module file would nonsensically update the 'invalid_kread_bytes' stat counter with the error value The first error is quite nasty, in that it can then cause subsequent idempotent loads of that same file to access stale stack contents of the previous failure. The case may not happen in any normal situation (explaining all the "Tested-by's on the original change), and requires admin privileges, but syzkaller triggers random bad behavior as a result: BUG: soft lockup in sys_finit_module BUG: unable to handle kernel paging request in init_module_from_file general protection fault in init_module_from_file INFO: task hung in init_module_from_file KASAN: out-of-bounds Read in init_module_from_file KASAN: slab-out-of-bounds Read in init_module_from_file ... The second error is fairly benign and just leads to nonsensical stats (and has been around since the debug stats were added). Vegard also provided a patch for the idempotent loading issue, but I'd rather re-organize the code and make it more legible using another level of helper functions than add the usual "goto out" error handling. Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/ Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	d8420b4b83	modules: catch concurrent module loads, treat them as idempotent JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 9b9879fc03275ffe0da328cf5b864d9e694167c8 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon May 29 21:39:51 2023 -0400 modules: catch concurrent module loads, treat them as idempotent This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or even thousands) of redundant duplicate modules in parallel. See commit 9828ed3f695a ("module: error out early on concurrent load of the same module file") for background and an earlier failed attempt that was reverted. That earlier attempt just said "concurrently loading the same module is silly, just open the module file exclusively and return -ETXTBSY if somebody else is already loading it". While it is true that concurrent module loads of the same module is silly, the reason that earlier attempt then failed was that the concurrently loaded module would often be a prerequisite for another module. Thus failing to load the prerequisite would then cause cascading failures of the other modules, rather than just short-circuiting that one unnecessary module load. At the same time, we still really don't want to load the contents of the same module file hundreds of times, only to then wait for an eventually successful load, and have everybody else return -EEXIST. As a result, this takes another approach, and treats concurrent module loads from the same file as "idempotent" in the inode. So if one module load is ongoing, we don't start a new one, but instead just wait for the first one to complete and return the same return value as it did. So unlike the first attempt, this does not return early: the intent is not to speed up the boot, but to avoid a thundering herd problem in allocating memory (both physical and virtual) for a module more than once. Also note that this does change behavior: it used to be that when you had concurrent loads, you'd have one "winner" that would return success, and everybody else would return -EEXIST. In contrast, this idempotent logic goes all Oprah on the problem, and says "You are a winner! And you are a winner! We are ALL winners". But since there's no possible actual real semantic difference between "you loaded the module" and "somebody else already loaded the module", this is more of a feel-good change than an actual honest-to-goodness semantic change. Of course, any true Johnny-come-latelies that don't get caught in the concurrency filter will still return -EEXIST. It's no different from not even getting a seat at an Oprah taping. That's life. See the long thread on the kernel mailing list about this all, which includes some numbers for memory use before and after the patch. Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/ Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	9e9e6cbdd2	module: split up 'finit_module()' into init_module_from_file() helper JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 054a73009c22a5fb8bbeee5394980809276bc9fe Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon May 29 20:55:13 2023 -0400 module: split up 'finit_module()' into init_module_from_file() helper This will simplify the next step, where we can then key off the inode to do one idempotent module load. Let's do the obvious re-organization in one step, and then the new code in another. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	efc5790fc4	kbuild: generate KSYMTAB entries by modpost JIRA: https://issues.redhat.com/browse/RHEL-28063 Conflicts: (1) Dropped patches for check-local-export; that script was temporarily replacing modpost, but abanadoned and modpost resumed with simpler addition and-or bug fixes, so skip it here. (2) Drop ia64 patches since RHEL doesn't support ia64, and didn't apply cleanly. (3) Made cmd_gensymversions genksyms exec same as cmd_gensymtypes; cmd_gensymversions appears to be a rhel-ism, and it has no callers/users under script hierarchy. commit ddb5cdbafaaad6b99d7007ae1740403124502d03 Author: Masahiro Yamada <masahiroy@kernel.org> Date: Mon Jun 12 00:50:52 2023 +0900 kbuild: generate KSYMTAB entries by modpost Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") made modpost output CRCs in the same way whether the EXPORT_SYMBOL() is placed in .c or .S. For further cleanups, this commit applies a similar approach to the entire data structure of EXPORT_SYMBOL(). The EXPORT_SYMBOL() compilation is split into two stages. When a source file is compiled, EXPORT_SYMBOL() will be converted into a dummy symbol in the .export_symbol section. For example, EXPORT_SYMBOL(foo); EXPORT_SYMBOL_NS_GPL(bar, BAR_NAMESPACE); will be encoded into the following assembly code: .section ".export_symbol","a" __export_symbol_foo: .asciz "" /* license / .asciz "" / name space / .balign 8 .quad foo / symbol reference / .previous .section ".export_symbol","a" __export_symbol_bar: .asciz "GPL" / license / .asciz "BAR_NAMESPACE" / name space / .balign 8 .quad bar / symbol reference / .previous They are mere markers to tell modpost the name, license, and namespace of the symbols. They will be dropped from the final vmlinux and modules because the (.export_symbol) will go into /DISCARD/ in the linker script. Then, modpost extracts all the information about EXPORT_SYMBOL() from the .export_symbol section, and generates the final C code: KSYMTAB_FUNC(foo, "", ""); KSYMTAB_FUNC(bar, "_gpl", "BAR_NAMESPACE"); KSYMTAB_FUNC() (or KSYMTAB_DATA() if it is data) is expanded to struct kernel_symbol that will be linked to the vmlinux or a module. With this change, EXPORT_SYMBOL() works in the same way for .c and .S files, providing the following benefits. [1] Deprecate EXPORT_DATA_SYMBOL() In the old days, EXPORT_SYMBOL() was only available in C files. To export a symbol in .S, EXPORT_SYMBOL() was placed in a separate .c file. arch/arm/kernel/armksyms.c is one example written in the classic manner. Commit `22823ab419` ("EXPORT_SYMBOL() for asm") removed this limitation. Since then, EXPORT_SYMBOL() can be placed close to the symbol definition in .S files. It was a nice improvement. However, as that commit mentioned, you need to use EXPORT_DATA_SYMBOL() for data objects on some architectures. In the new approach, modpost checks symbol's type (STT_FUNC or not), and outputs KSYMTAB_FUNC() or KSYMTAB_DATA() accordingly. There are only two users of EXPORT_DATA_SYMBOL: EXPORT_DATA_SYMBOL_GPL(empty_zero_page) (arch/ia64/kernel/head.S) EXPORT_DATA_SYMBOL(ia64_ivt) (arch/ia64/kernel/ivt.S) They are transformed as follows and output into .vmlinux.export.c KSYMTAB_DATA(empty_zero_page, "_gpl", ""); KSYMTAB_DATA(ia64_ivt, "", ""); The other EXPORT_SYMBOL users in ia64 assembly are output as KSYMTAB_FUNC(). EXPORT_DATA_SYMBOL() is now deprecated. [2] merge <linux/export.h> and <asm-generic/export.h> There are two similar header implementations: include/linux/export.h for .c files include/asm-generic/export.h for .S files Ideally, the functionality should be consistent between them, but they tend to diverge. Commit `8651ec01da` ("module: add support for symbol namespaces.") did not support the namespace for .S files. This commit shifts the essential implementation part to C, which supports EXPORT_SYMBOL_NS() for *.S files. <asm/export.h> and <asm-generic/export.h> will remain as a wrapper of <linux/export.h> for a while. They will be removed after #include <asm/export.h> directives are all replaced with #include <linux/export.h>. [3] Implement CONFIG_TRIM_UNUSED_KSYMS in one-pass algorithm (by a later commit) When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses the directory tree to determine which EXPORT_SYMBOL to trim. If an EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the second traverse, where some source files are recompiled with their EXPORT_SYMBOL() tuned into a no-op. We can do this better now; modpost can selectively emit KSYMTAB entries that are really used by modules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:28 -04:00
Donald Dutile	a5ba435c43	module/decompress: Fix error checking on zstd decompression JIRA: https://issues.redhat.com/browse/RHEL-28063 commit fadb74f9f2f609238070c7ca1b04933dc9400e4a Author: Lucas De Marchi <lucas.demarchi@intel.com> Date: Thu Jun 1 14:23:31 2023 -0700 module/decompress: Fix error checking on zstd decompression While implementing support for in-kernel decompression in kmod, finit_module() was returning a very suspicious value: finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296 It turns out the check for module_get_next_page() failing is wrong, and hence the decompression was not really taking place. Invert the condition to fix it. Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression") Cc: stable@kernel.org Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:27 -04:00
Donald Dutile	a2e3e61a15	module: fix module load for ia64 JIRA: https://issues.redhat.com/browse/RHEL-28063 commit db3e33dd8bd956f165436afdbdbf1c653fb3c8e6 Author: Song Liu <song@kernel.org> Date: Sun May 28 16:00:41 2023 -0700 module: fix module load for ia64 Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit ac3b43283923 ("module: replace module_layout with module_memory") Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Frank Scheiner <frank.scheiner@web.de> Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html Closes: https://marc.info/?l=linux-ia64&m=168509859125505 Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:27 -04:00
Donald Dutile	31b1bf449a	kallsyms: remove unsed API lookup_symbol_attrs JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 4f521bab5bfc854ec0dab7ef560dfa75247e615d Author: Maninder Singh <maninder1.s@samsung.com> Date: Fri May 26 12:51:23 2023 +0530 kallsyms: remove unsed API lookup_symbol_attrs with commit '7878c231dae0 ("slab: remove /proc/slab_allocators")' lookup_symbol_attrs usage is removed. Thus removing redundant API. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:26 -04:00
Donald Dutile	b30cf49e7c	module: Remove preempt_disable() from module reference counting. JIRA: https://issues.redhat.com/browse/RHEL-28063 commit cb0b50b813f6198b7d44ae8e169803440333577a Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Tue May 9 15:49:02 2023 +0200 module: Remove preempt_disable() from module reference counting. The preempt_disable() section in module_put() was added in commit `e1783a240f` ("module: Use this_cpu_xx to dynamically allocate counters") while the per-CPU counter were switched to another API. The API requires that during the RMW operation the CPU remained the same. This counting API was later replaced with atomic_t in commit `2f35c41f58` ("module: Replace module_ref with atomic_t refcnt") Since this atomic_t replacement there is no need to keep preemption disabled while the reference counter is modified. Remove preempt_disable() from module_put(), __module_get() and try_module_get(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:26 -04:00
Donald Dutile	c19ad53194	module: Fix use-after-free bug in read_file_mod_stats() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit d36f6efbe0cb422fe1e4475717d75f3737088832 Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Date: Thu Apr 27 22:59:33 2023 -0700 module: Fix use-after-free bug in read_file_mod_stats() Smatch warns: kernel/module/stats.c:394 read_file_mod_stats() warn: passing freed memory 'buf' We are passing 'buf' to simple_read_from_buffer() after freeing it. Fix this by changing the order of 'simple_read_from_buffer' and 'kfree'. Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:26 -04:00
Donald Dutile	df6febf734	module: include internal.h in module/dups.c JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 0b891c83d8c54cb70e186456c2191adb5fd98c56 Author: Arnd Bergmann <arnd@arndb.de> Date: Sat Apr 29 22:36:04 2023 +0200 module: include internal.h in module/dups.c Two newly introduced functions are declared in a header that is not included before the definition, causing a warning with sparse or 'make W=1': kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes] 118 \| bool kmod_dup_request_exists_wait(char module_name, bool wait, int dup_ret) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes] 220 \| void kmod_dup_request_announce(char *module_name, int ret) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ Add an explicit include to ensure the prototypes match. Fixes: 8660484ed1cf ("module: add debugging auto-load duplicate module support") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304141440.DYO4NAzp-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:26 -04:00
Donald Dutile	a27f75beb4	module: add debugging auto-load duplicate module support JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 8660484ed1cf3261e89e0bad94c6395597e87599 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Thu Apr 13 22:28:39 2023 -0700 module: add debugging auto-load duplicate module support The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs. To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded. This adds debugging support to the kernel module auto-loader (request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests. Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:26 -04:00
Donald Dutile	c35bdd4008	module: stats: fix invalid_mod_bytes typo JIRA: https://issues.redhat.com/browse/RHEL-28063 commit a81b1fc8ea639e03326c1d0dcde041986bc11500 Author: Arnd Bergmann <arnd@arndb.de> Date: Tue Apr 18 09:17:51 2023 +0200 module: stats: fix invalid_mod_bytes typo This was caught by randconfig builds but does not show up in build testing without CONFIG_MODULE_DECOMPRESS: kernel/module/stats.c: In function 'mod_stat_bump_invalid': kernel/module/stats.c:229:42: error: 'invalid_mod_byte' undeclared (first use in this function); did you mean 'invalid_mod_bytes'? 229 \| atomic_long_add(info->compressed_len, &invalid_mod_byte); \| ^~~~~~~~~~~~~~~~ \| invalid_mod_bytes Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	1a4227bc49	module: remove use of uninitialized variable len JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 9f5cab173e19201eebeaca853ff664a9a269fed0 Author: Tom Rix <trix@redhat.com> Date: Mon Apr 17 19:09:57 2023 -0400 module: remove use of uninitialized variable len clang build reports kernel/module/stats.c:307:34: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized] len = scnprintf(buf + 0, size - len, ^~~ At the start of this sequence, neither the '+ 0', nor the '- len' are needed. So remove them and fix using 'len' uninitalized. Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	9bfdfec27a	module: fix building stats for 32-bit targets JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 719ccd803ed5bd1ad92b0b46fc095b8fe266827e Author: Arnd Bergmann <arnd@arndb.de> Date: Tue Apr 18 00:48:04 2023 +0200 module: fix building stats for 32-bit targets The new module statistics code mixes 64-bit types and wordsized 'long' variables, which leads to build failures on 32-bit architectures: kernel/module/stats.c: In function 'read_file_mod_stats': kernel/module/stats.c:291:29: error: passing argument 1 of 'atomic64_read' from incompatible pointer type [-Werror=incompatible-pointer-types] 291 \| total_size = atomic64_read(&total_mod_size); x86_64-linux-ld: kernel/module/stats.o: in function `read_file_mod_stats': stats.c:(.text+0x2b2): undefined reference to `__udivdi3' To fix this, the code has to use one of the two types consistently. Change them all to word-size types here. Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	c9be6e3679	module: stats: include uapi/linux/module.h JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 635dc38314c75c5727711d896d4c71ec92f6f20b Author: Arnd Bergmann <arnd@arndb.de> Date: Tue Apr 18 00:02:46 2023 +0200 module: stats: include uapi/linux/module.h MODULE_INIT_COMPRESSED_FILE is defined in the uapi header, which is not included indirectly from the normal linux/module.h, but has to be pulled in explicitly: kernel/module/stats.c: In function 'mod_stat_bump_invalid': kernel/module/stats.c:227:14: error: 'MODULE_INIT_COMPRESSED_FILE' undeclared (first use in this function) 227 \| if (flags & MODULE_INIT_COMPRESSED_FILE) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	2988d369f3	module: avoid allocation if module is already present and ready JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 064f4536d13939b6e8cdb71298ff5d657f4f8caa Author: Luis Chamberlain <mcgrof@kernel.org> Date: Fri Mar 10 20:48:03 2023 -0800 module: avoid allocation if module is already present and ready The finit_module() system call can create unnecessary virtual memory pressure for duplicate modules. This is because load_module() can in the worse case allocate more than twice the size of a module in virtual memory. This saves at least a full size of the module in wasted vmalloc space memory by trying to avoid duplicates as soon as we can validate the module name in the read module structure. This can only be an issue if a system is getting hammered with userspace loading modules. There are two ways to load modules typically on systems, one is the kernel moduile auto-loading (request_module() calls in-kernel) and the other is things like udev. The auto-loading is in-kernel, but that pings back to userspace to just call modprobe. We already have a way to restrict the amount of concurrent kernel auto-loads in a given time, however that still allows multiple requests for the same module to go through and force two threads in userspace racing to call modprobe for the same exact module. Even though libkmod which both modprobe and udev does check if a module is already loaded prior calling finit_module() races are still possible and this is clearly evident today when you have multiple CPUs. To avoid memory pressure for such stupid cases put a stop gap for them. The earliest we can detect duplicates from the modules side of things is once we have blessed the module name, sadly after the first vmalloc allocation. We can check for the module being present before a secondary vmalloc() allocation. There is a linear relationship between wasted virtual memory bytes and the number of CPU counts. The reason is that udev ends up racing to call tons of the same modules for each of the CPUs. We can see the different linear relationships between wasted virtual memory and CPU count during after boot in the following graph: +----------------------------------------------------------------------------+ 14GB \|-+ + + + + + +-\| \| * \| \| * \| \| \| 12GB \|-+ +-\| \| \| \| \| \| \| \| \| 10GB \|-+ +-\| \| \| \| \| \| \| 8GB \|-+ +-\| waste \| ### \| \| #### \| \| ####### \| 6GB \|-+ **** #### +-\| \| * #### \| \| * #### \| \| *** #### \| 4GB \|-+ #### +-\| \| #### \| \| #### \| \| #### \| 2GB \|-+ ##### +-\| \| * #### \| \| * #### Before ***** \| \| ## + + + + After ####### \| +----------------------------------------------------------------------------+ 0 50 100 150 200 250 300 CPUs count On the y-axis we can see gigabytes of wasted virtual memory during boot due to duplicate module requests which just end up failing. Trying to infer the slope this ends up being about ~463 MiB per CPU lost prior to this patch. After this patch we only loose about ~230 MiB per CPU, for a total savings of about ~233 MiB per CPU. This is all just on bootup! On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests kmod.sh -t 0008 I see a saving in the highest side of memory consumption of up to ~ 84 MiB with the Linux kernel selftests kmod test 0008. With the new stress-ng module test I see a 145 MiB difference in max memory consumption with 100 ops. The stress-ng module ops tests can be pretty pathalogical -- it is not realistic, however it was used to finally successfully reproduce issues which are only reported to happen on system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM system. Running out of virtual memory space is no surprise given the above graph, since at least on x86_64 we're capped at 128 MiB, eventually we'd hit a series of errors and once can use the above graph to guestimate when. This of course will vary depending on the features you have enabled. So for instance, enabling KASAN seems to make this much worse. The results with kmod and stress-ng can be observed and visualized below. The time it takes to run the test is also not affected. The kmod tests 0008: The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib) given the tests peak around that range. cat kmod.plot set term dumb set output fileout set yrange [400000:580000] plot filein with linespoints title "Memory usage (KiB)" Before: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 \| grep Mem \| awk '{print $3}' > log-0008-before.txt ^C root@kmod ~ # sort -n -r log-0008-before.txt \| head -1 528732 So ~516.33 MiB After: root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 root@kmod ~ # free -k -s 1 -c 40 \| grep Mem \| awk '{print $3}' > log-0008-after.txt ^C root@kmod ~ # sort -n -r log-0008-after.txt \| head -1 442516 So ~432.14 MiB That's about 84 ~MiB in savings in the worst case. The graphs: root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot root@kmod ~ # gnuplot -e "filein='log-0008-after.txt'; fileout='graph-0008-after.txt'" kmod.plot root@kmod ~ # cat graph-0008-before.txt 580000 +-----------------------------------------------------------------+ \| + + + + + + + \| 560000 \|-+ Memory usage (KiB) *A-\| \| \| 540000 \|-+ +-\| \| \| \| A AAAAAAA AAA AAA AAAAAA A \| 520000 \|-+AAAA AAA AAAAAA AA A A+-\| \|A \| 500000 \|-+ +-\| \| \| 480000 \|-+ +-\| \| \| 460000 \|-+ +-\| \| \| \| \| 440000 \|-+ +-\| \| \| 420000 \|-+ +-\| \| + + + + + + + \| 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-0008-after.txt 580000 +-----------------------------------------------------------------+ \| + + + + + + + \| 560000 \|-+ Memory usage (KiB) *A-\| \| \| 540000 \|-+ +-\| \| \| \| \| 520000 \|-+ +-\| \| \| 500000 \|-+ +-\| \| \| 480000 \|-+ +-\| \| \| 460000 \|-+ +-\| \| \| \| A AA \| 440000 \|-+AAAAA A AAAA AAAAAAAAAAAAAAAAAAAAA-\| \|A AAAA \| 420000 \|-+ +-\| \| + + + + + + + \| 400000 +-----------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 The stress-ng module tests: This is used to run the test to try to reproduce the vmap issues reported by David: echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Prior to this commit: root@kmod ~ # free -k -s 1 -c 40 \| grep Mem \| awk '{print $3}' > baseline-stress-ng.txt root@kmod ~ # sort -n -r baseline-stress-ng.txt \| head -1 5046456 After this commit: root@kmod ~ # free -k -s 1 -c 40 \| grep Mem \| awk '{print $3}' > after-stress-ng.txt root@kmod ~ # sort -n -r after-stress-ng.txt \| head -1 4896972 5046456 - 4896972 149484 149484/1024 145.98046875000000000000 So this commit using stress-ng reveals saving about 145 MiB in memory using 100 ops from stress-ng which reproduced the vmap issue reported. cat kmod.plot set term dumb set output fileout set yrange [4700000:5070000] plot filein with linespoints title "Memory usage (KiB)" root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'" kmod-simple-stress-ng.plot root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'" kmod-simple-stress-ng.plot root@kmod ~ # cat graph-stress-ng-before.txt +---------------------------------------------------------------+ 5.05e+06 \|-+ + A + + + + + + +-\| \| * Memory usage (KiB) *A* \| \| * A \| 5e+06 \|-+ +-\| \| ** * * A \| 4.95e+06 \|-+ * * A * A* +-\| \| * * A A * * * * A \| \| * * * * * * A * * A * \| 4.9e+06 \|-+ * * * AA AAAA A A A AA +-\| \| A AA A * A * * A A * A * ** \| \| * * * * * * * * \| 4.85e+06 \|-+ A A A ** * * ** -\| \| * * * ** * \| \| * A * * * * \| 4.8e+06 \|-+ * * * A A-\| \| * * * \| 4.75e+06 \|-+ * * * +-\| \| * ** \| \| * + + + + + + + \| 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 root@kmod ~ # cat graph-stress-ng-after.txt +---------------------------------------------------------------+ 5.05e+06 \|-+ + + + + + + + +-\| \| Memory usage (KiB) A** \| \| \| 5e+06 \|-+ +-\| \| \| 4.95e+06 \|-+ +-\| \| \| \| \| 4.9e+06 \|-+ AA +-\| \| AAAAA A AAAAAAAAA A A AA AAAA A AAAAA \| \| * ** * * * ** * *** * \| 4.85e+06 \|-+* *** * * * * *** A * * +-\| \| * A * * ** * * A * * \| \| * * * * ** * * \| 4.8e+06 \|-+* * * A * * * +-\| \| * * * A * * \| 4.75e+06 \|-* * * * * +-\| \| * * * * * \| \| * + * + + + + + *+ \| 4.7e+06 +---------------------------------------------------------------+ 0 5 10 15 20 25 30 35 40 [0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	e41f7154cf	module: add debug stats to help identify memory pressure JIRA: https://issues.redhat.com/browse/RHEL-28063 Conflicts: Adding RHEL-only MODULE_STATS set to n for now. Possibly add in future -debug kernels, to be determined. Add rest of commit(s) to enable clean backport for further commits. commit df3e764d8e5cd416efee29e0de3c93917dff5d33 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Tue Mar 28 20:03:19 2023 -0700 module: add debug stats to help identify memory pressure Loading modules with finit_module() can end up using vmalloc(), vmap() and vmalloc() again, for a total of up to 3 separate allocations in the worst case for a single module. We always kernel_read() the module, that's a vmalloc(). Then vmap() is used for the module decompression, and if so the last read buffer is freed as we use the now decompressed module buffer to stuff data into our copy module. The last allocation is specific to each architectures but pretty much that's generally a series of vmalloc() calls or a variation of vmalloc to handle ELF sections with special permissions. Evaluation with new stress-ng module support [1] with just 100 ops is proving that you can end up using GiBs of data easily even with all care we have in the kernel and userspace today in trying to not load modules which are already loaded. 100 ops seems to resemble the sort of pressure a system with about 400 CPUs can create on module loading. Although issues relating to duplicate module requests due to each CPU inucurring a new module reuest is silly and some of these are being fixed, we currently lack proper tooling to help diagnose easily what happened, when it happened and who likely is to blame -- userspace or kernel module autoloading. Provide an initial set of stats which use debugfs to let us easily scrape post-boot information about failed loads. This sort of information can be used on production worklaods to try to optimize avoiding* redundant memory pressure using finit_module(). There's a few examples that can be provided: A 255 vCPU system without the next patch in this series applied: Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s graphical.target reached after 6.988s in userspace And 13.58 GiB of virtual memory space lost due to failed module loading: root@big ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 0 Mods failed on load 1411 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 0 Failed kmod bytes 14588526272 Virtual mem wasted bytes 14588526272 Average mod size 171115 Average mod text size 62602 Average fail load bytes 10339140 Duplicate failed modules: module-name How-many-times Reason kvm_intel 249 Load kvm 249 Load irqbypass 8 Load crct10dif_pclmul 128 Load ghash_clmulni_intel 27 Load sha512_ssse3 50 Load sha512_generic 200 Load aesni_intel 249 Load crypto_simd 41 Load cryptd 131 Load evdev 2 Load serio_raw 1 Load virtio_pci 3 Load nvme 3 Load nvme_core 3 Load virtio_pci_legacy_dev 3 Load virtio_pci_modern_dev 3 Load t10_pi 3 Load virtio 3 Load crc32_pclmul 6 Load crc64_rocksoft 3 Load crc32c_intel 40 Load virtio_ring 3 Load crc64 3 Load The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the next patch in this series applied, shows 226.53 MiB are wasted in virtual memory allocations which due to duplicate module requests during boot. It also shows an average module memory size of 167.10 KiB and an an average module .text + .init.text size of 61.13 KiB. The end shows all modules which were detected as duplicate requests and whether or not they failed early after just the first kernel_read*() call or late after we've already allocated the private space for the module in layout_and_allocate(). A system with module decompression would reveal more wasted virtual memory space. We should put effort now into identifying the source of these duplicate module requests and trimming these down as much possible. Larger systems will obviously show much more wasted virtual memory allocations. root@kmod ~ # cat /sys/kernel/debug/modules/stats Mods ever loaded 67 Mods failed on kread 0 Mods failed on decompress 0 Mods failed on becoming 83 Mods failed on load 16 Total module size 11464704 Total mod text size 4194304 Failed kread bytes 0 Failed decompress bytes 0 Failed becoming bytes 228959096 Failed kmod bytes 8578080 Virtual mem wasted bytes 237537176 Average mod size 171115 Average mod text size 62602 Avg fail becoming bytes 2758544 Average fail load bytes 536130 Duplicate failed modules: module-name How-many-times Reason kvm_intel 7 Becoming kvm 7 Becoming irqbypass 6 Becoming & Load crct10dif_pclmul 7 Becoming & Load ghash_clmulni_intel 7 Becoming & Load sha512_ssse3 6 Becoming & Load sha512_generic 7 Becoming & Load aesni_intel 7 Becoming crypto_simd 7 Becoming & Load cryptd 3 Becoming & Load evdev 1 Becoming serio_raw 1 Becoming nvme 3 Becoming nvme_core 3 Becoming t10_pi 3 Becoming virtio_pci 3 Becoming crc32_pclmul 6 Becoming & Load crc64_rocksoft 3 Becoming crc32c_intel 3 Becoming virtio_pci_modern_dev 2 Becoming virtio_pci_legacy_dev 1 Becoming crc64 2 Becoming virtio 2 Becoming virtio_ring 2 Becoming [0] https://github.com/ColinIanKing/stress-ng.git [1] echo 0 > /proc/sys/vm/oom_dump_tasks ./stress-ng --module 100 --module-name xfs Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	5d28ce51db	module: extract patient module check into helper JIRA: https://issues.redhat.com/browse/RHEL-28063 commit f71afa6a420111da90657fe999a8e32c42d5c7d6 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Fri Mar 10 20:05:52 2023 -0800 module: extract patient module check into helper The patient module check inside add_unformed_module() is large enough as we need it. It is a bit hard to read too, so just move it to a helper and do the inverse checks first to help shift the code and make it easier to read. The new helper then is module_patient_check_exists(). To make this work we need to mvoe the finished_loading() up, we do that without making any functional changes to that routine. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	a1da4498e3	modules/kmod: replace implementation with a semaphore JIRA: https://issues.redhat.com/browse/RHEL-28063 Conflicts: RHEL9 does not have upstrema 48380368dec148 which changed DEFINE_SEMAPHORE to take a number argument. Implement the RHEL9 single arg version that open-codes the initialization. commit 25a1b5b518f4336bff934ac8348da6c57158363a Author: Luis Chamberlain <mcgrof@kernel.org> Date: Fri Mar 24 18:38:00 2023 -0700 modules/kmod: replace implementation with a semaphore Simplify the concurrency delimiter we use for kmod with the semaphore. I had used the kmod strategy to try to implement a similar concurrency delimiter for the kernel_read() calls from the finit_module() path so to reduce vmalloc() memory pressure. That effort didn't provide yet conclusive results, but one thing that became clear is we can use the suggested alternative solution with semaphores which Linus hinted at instead of using the atomic / wait strategy. I've stress tested this with kmod test 0008: time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008 And I get only a slight* delay. That delay however is small, a few seconds for a full test loop run that runs 150 times, for about ~30-40 seconds. The small delay is worth the simplfication IMHO. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	0e519b399a	module: fix kmemleak annotations for non init ELF sections JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 430bb0d1c3376c988982f14bcbe71f917c89e1ab Author: Luis Chamberlain <mcgrof@kernel.org> Date: Tue Apr 4 18:52:47 2023 -0700 module: fix kmemleak annotations for non init ELF sections Commit ac3b43283923 ("module: replace module_layout with module_memory") reworked the way to handle memory allocations to make it clearer. But it lost in translation how we handled kmemleak_ignore() or kmemleak_not_leak() for different ELF sections. Fix this and clarify the comments a bit more. Contrary to the old way of using kmemleak_ignore() for init.* ELF sections we stick now only to kmemleak_not_leak() as per suggestion by Catalin Marinas so to avoid any false positives and simplify the code. Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Jim Cromie <jim.cromie@gmail.com> Acked-by: Song Liu <song@kernel.org> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	f776cbeb9c	module: Ignore L0 and rename is_arm_mapping_symbol() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 0a3bf86092c38f7b72c56c6901c78dd302411307 Author: Tiezhu Yang <yangtiezhu@loongson.cn> Date: Fri Mar 31 17:15:53 2023 +0800 module: Ignore L0 and rename is_arm_mapping_symbol() The L0 symbol is generated when build module on LoongArch, ignore it in modpost and when looking at module symbols, otherwise we can not see the expected call trace. Now is_arm_mapping_symbol() is not only for ARM, in order to reflect the reality, rename is_arm_mapping_symbol() to is_mapping_symbol(). This is related with commit c17a2538704f ("mksysmap: Fix the mismatch of 'L0' symbols in System.map"). (1) Simple test case [loongson@linux hello]$ cat hello.c #include <linux/init.h> #include <linux/module.h> #include <linux/printk.h> static void test_func(void) { pr_info("This is a test\n"); dump_stack(); } static int __init hello_init(void) { pr_warn("Hello, world\n"); test_func(); return 0; } static void __exit hello_exit(void) { pr_warn("Goodbye\n"); } module_init(hello_init); module_exit(hello_exit); MODULE_LICENSE("GPL"); [loongson@linux hello]$ cat Makefile obj-m:=hello.o ccflags-y += -g -Og all: make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules clean: make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) clean (2) Test environment system: LoongArch CLFS 5.5 https://github.com/sunhaiyong1978/CLFS-for-LoongArch/releases/tag/5.0 It needs to update grub to avoid booting error "invalid magic number". kernel: 6.3-rc1 with loongson3_defconfig + CONFIG_DYNAMIC_FTRACE=y (3) Test result Without this patch: [root@linux hello]# insmod hello.ko [root@linux hello]# dmesg ... Hello, world This is a test ... Call Trace: [<9000000000223728>] show_stack+0x68/0x18c [<90000000013374cc>] dump_stack_lvl+0x60/0x88 [<ffff800002050028>] L0\x01+0x20/0x2c [hello] [<ffff800002058028>] L0\x01+0x20/0x30 [hello] [<900000000022097c>] do_one_initcall+0x88/0x288 [<90000000002df890>] do_init_module+0x54/0x200 [<90000000002e1e18>] __do_sys_finit_module+0xc4/0x114 [<90000000013382e8>] do_syscall+0x7c/0x94 [<9000000000221e3c>] handle_syscall+0xbc/0x158 With this patch: [root@linux hello]# insmod hello.ko [root@linux hello]# dmesg ... Hello, world This is a test ... Call Trace: [<9000000000223728>] show_stack+0x68/0x18c [<90000000013374cc>] dump_stack_lvl+0x60/0x88 [<ffff800002050028>] test_func+0x28/0x34 [hello] [<ffff800002058028>] hello_init+0x28/0x38 [hello] [<900000000022097c>] do_one_initcall+0x88/0x288 [<90000000002df890>] do_init_module+0x54/0x200 [<90000000002e1e18>] __do_sys_finit_module+0xc4/0x114 [<90000000013382e8>] do_syscall+0x7c/0x94 [<9000000000221e3c>] handle_syscall+0xbc/0x158 Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Youling Tang <tangyouling@loongson.cn> # for LoongArch Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	7dbb7e3cc8	module: Move is_arm_mapping_symbol() to module_symbol.h JIRA: https://issues.redhat.com/browse/RHEL-28063 Conflict: Slight context diff but patch changes are the same. commit 987d2e0aaa55de40938435be760aa96428470fd6 Author: Tiezhu Yang <yangtiezhu@loongson.cn> Date: Fri Mar 31 17:15:52 2023 +0800 module: Move is_arm_mapping_symbol() to module_symbol.h In order to avoid duplicated code, move is_arm_mapping_symbol() to include/linux/module_symbol.h, then remove is_arm_mapping_symbol() in the other places. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	b8f8ff56f2	module: Sync code of is_arm_mapping_symbol() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 87e5b1e8f257023ac5c4d2b8f07716a7f3dcc8ea Author: Tiezhu Yang <yangtiezhu@loongson.cn> Date: Fri Mar 31 17:15:51 2023 +0800 module: Sync code of is_arm_mapping_symbol() After commit `2e3a10a155` ("ARM: avoid ARM binutils leaking ELF local symbols") and commit d6b732666a1b ("modpost: fix undefined behavior of is_arm_mapping_symbol()"), many differences of is_arm_mapping_symbol() exist in kernel/module/kallsyms.c and scripts/mod/modpost.c, just sync the code to keep consistent. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	58c52f78c0	module: already_uses() - reduce pr_debug output volume JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 33c951f62920d144ca89daa0560180a49afb6f1e Author: Jim Cromie <jim.cromie@gmail.com> Date: Tue Mar 21 19:36:23 2023 -0600 module: already_uses() - reduce pr_debug output volume already_uses() is unnecessarily chatty. `modprobe i915` yields 491 messages like: [ 64.108744] i915 uses drm! This is a normal situation, and isn't worth all the log entries. NOTE: I've preserved the "does not use %s" messages, which happens less often, but does happen. Its not clear to me what it tells a reader, or what info might improve the pr_debug's utility. [ 6847.584999] main:already_uses:569: amdgpu does not use ttm! [ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585014] main:already_uses:569: amdgpu does not use drm! [ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper! [ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper! [ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy! [ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit! [ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched! [ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585314] main:already_uses:569: amdgpu does not use video! [ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2! [ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper! [ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu. [ 6848.762268] dyndbg: add-module: amdgpu.2533 sites no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	5bcdd7cfaf	module: add section-size to move_module pr_debug JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 66a2301edf313d630c2ece4f3721c5b3402653ee Author: Jim Cromie <jim.cromie@gmail.com> Date: Tue Mar 21 19:36:22 2023 -0600 module: add section-size to move_module pr_debug move_module() pr_debug's "Final section addresses for $modname". Add section addresses to the message, for anyone looking at these. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	db74bac793	module: add symbol-name to pr_debug Absolute symbol JIRA: https://issues.redhat.com/browse/RHEL-28063 commit b10addf37bbcaee66672eb54c15532266c8daea6 Author: Jim Cromie <jim.cromie@gmail.com> Date: Tue Mar 21 19:36:21 2023 -0600 module: add symbol-name to pr_debug Absolute symbol The pr_debug("Absolute symbol" ..) reports value, (which is usually 0), but not the name, which is more informative. So add it. no functional changes Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:25 -04:00
Donald Dutile	38e20bcc3c	module: in layout_sections, move_module: add the modname JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 6ed81802d4d1b037ad2d1657511ff0c2e9aeda14 Author: Jim Cromie <jim.cromie@gmail.com> Date: Tue Mar 21 19:36:20 2023 -0600 module: in layout_sections, move_module: add the modname layout_sections() and move_module() each issue ~50 messages for each module loaded. Add mod-name into their 2 header lines, to help the reader find his module. no functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	ccf1cf0a34	module: fold usermode helper kmod into modules directory JIRA: https://issues.redhat.com/browse/RHEL-28063 Conflict: Dropped MAINTAINERS hunk because it wouldn't apply cleanly and not needed. commit 25be451aa4c0e9a96c59a626ab0e93d5cb7f6f48 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:35:42 2023 -0700 module: fold usermode helper kmod into modules directory The kernel/kmod.c is already only built if we enabled modules, so just stuff it under kernel/module/kmod.c and unify the MAINTAINERS file for it. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	d97fc0bedf	module: merge remnants of setup_load_info() to elf validation JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 3d40bb903ed1f654707d34bdd61ee2c332000e4b Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:35:41 2023 -0700 module: merge remnants of setup_load_info() to elf validation The setup_load_info() was actually had ELF validation checks of its own. To later cache useful variables as an secondary step just means looping again over the ELF sections we just validated. We can simply keep tabs of the key sections of interest as we validate the module ELF section in one swoop, so do that and merge the two routines together. Expand a bit on the documentation / intent / goals. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	724139c3b1	module: move more elf validity checks to elf_validity_check() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 1bb49db9919a4d4186cba288930e7026d8f7ec96 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:35:40 2023 -0700 module: move more elf validity checks to elf_validity_check() The symbol and strings section validation currently happen in setup_load_info() but since they are also doing validity checks move this to elf_validity_check(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	33c789403d	module: add stop-grap sanity check on module memcpy() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit c7ee8aebf6c0588c0aab76538aff395c3abf811c Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:35:39 2023 -0700 module: add stop-grap sanity check on module memcpy() The integrity of the struct module we load is important, and although our ELF validator already checks that the module section must match struct module, add a stop-gap check before we memcpy() the final minted module. This also makes those inspecting the code what the goal is. While at it, clarify the goal behind updating the sh_addr address. The current comment is pretty misleading. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	d6be2f8a79	module: add sanity check for ELF module section JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 46752820f9abc013b6bd8172562b642376723313 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:35:38 2023 -0700 module: add sanity check for ELF module section The ELF ".gnu.linkonce.this_module" section is special, it is what we use to construct the struct module __this_module, which THIS_MODULE points to. When userspace loads a module we always deal first with a copy of the userspace buffer, and twiddle with the userspace copy's version of the struct module. Eventually we allocate memory to do a memcpy() of that struct module, under the assumption that the module size is right. But we have no validity checks against the size or the requirements for the section. Add some validity checks for the special module section early and while at it, cache the module section index early, so we don't have to do that later. While at it, just move over the assigment of the info->mod to make the code clearer. The validity checker also adds an explicit size check to ensure the module section size matches the kernel's run time size for sizeof(struct module). This should prevent sloppy loads of modules which are built today without actually increasing the size of the struct module. A developer today can for example expand the size of struct module, rebuild a directoroy 'make fs/xfs/' for example and then try to insmode the driver there. That module would in effect have an incorrect size. This new size check would put a stop gap against such mistakes. This also makes the entire goal of ".gnu.linkonce.this_module" pretty clear. Before this patch verification of the goal / intent required some Indian Jones whips, torches and cleaning up big old spider webs. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	57edbf0df7	module: rename check_module_license_and_versions() to check_export_symbol_versions() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 419e1a20f7bdef5380fde5ed73f05c98c28a598b Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:27:46 2023 -0700 module: rename check_module_license_and_versions() to check_export_symbol_versions() This makes the routine easier to understand what the check its checking for. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	4f4151e7ba	module: converge taint work together JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 72f08b3cc631f4ebcaa9f373d18fc0b877fb6458 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:27:45 2023 -0700 module: converge taint work together Converge on a compromise: so long as we have a module hit our linked list of modules we taint. That is, the module was about to become live. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	88c5188c62	module: move signature taint to module_augment_kernel_taints() JIRA: https://issues.redhat.com/browse/RHEL-28063 commit c3bbf62ebf8c9e87cea875cfa146f44f46af4145 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:27:44 2023 -0700 module: move signature taint to module_augment_kernel_taints() Just move the signature taint into the helper: module_augment_kernel_taints() Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	d3fa8f2f9b	module: move tainting until after a module hits our linked list JIRA: https://issues.redhat.com/browse/RHEL-28063 commit a12b94511cf36855cd731c16005bd535e2007552 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:27:43 2023 -0700 module: move tainting until after a module hits our linked list It is silly to have taints spread out all over, we can just compromise and add them if the module ever hit our linked list. Our sanity checkers should just prevent crappy drivers / bogus ELF modules / etc and kconfig options should be enough to let you not load things you don't want. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00
Donald Dutile	a24c049ebc	module: split taint adding with info checking JIRA: https://issues.redhat.com/browse/RHEL-28063 commit 437c1f9cc61fd37829eaf12d8ae2f7dcc5dddce0 Author: Luis Chamberlain <mcgrof@kernel.org> Date: Sun Mar 19 14:27:42 2023 -0700 module: split taint adding with info checking check_modinfo() actually does two things: a) sanity checks, some of which are fatal, and so we prevent the user from completing trying to load a module b) taints the kernel The taints are pretty heavy handed because we're tainting the kernel before we ever even get to load the module into the modules linked list. That is, it it can fail for other reasons later as we review the module's structure. But this commit makes no functional changes, it just makes the intent clearer and splits the code up where needed to make that happen. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Donald Dutile <ddutile@redhat.com>	2024-06-17 14:17:24 -04:00

1 2 3

122 Commits