Commit Graph

122 Commits

Author SHA1 Message Date
Viktor Malik 565c35b3f1
module, bpf: Store BTF base pointer in struct module
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit d4e48e3dd45017abdd69a19285d197de897ef44f
Author: Alan Maguire <alan.maguire@oracle.com>
Date:   Thu Jun 20 10:17:29 2024 +0100

    module, bpf: Store BTF base pointer in struct module
    
    ...as this will allow split BTF modules with a base BTF
    representation (rather than the full vmlinux BTF at time of
    BTF encoding) to resolve their references to kernel types in a
    way that is more resilient to small changes in kernel types.
    
    This will allow modules that are not built every time the kernel
    is to provide more resilient BTF, rather than have it invalidated
    every time BTF ids for core kernel types change.
    
    Fields are ordered to avoid holes in struct module.
    
    Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20240620091733.1967885-3-alan.maguire@oracle.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 15:55:10 +01:00
Nico Pache d6b2c538d9 kunit: add KUNIT_INIT_TABLE to init linker section
commit d81f0d7b8b23ec79f80be602ed6129ded27862e8
Author: Rae Moar <rmoar@google.com>
Date:   Wed Dec 13 19:44:17 2023 +0000

    kunit: add KUNIT_INIT_TABLE to init linker section

    Add KUNIT_INIT_TABLE to the INIT_DATA linker section.

    Alter the KUnit macros to create init tests:
    kunit_test_init_section_suites

    Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and
    KUNIT_INIT_TABLE.

    Reviewed-by: David Gow <davidgow@google.com>
    Signed-off-by: Rae Moar <rmoar@google.com>
    Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-39303
Signed-off-by: Nico Pache <npache@redhat.com>
2024-07-31 20:32:28 -06:00
Donald Dutile aaaa438fc7 modules: wait do_free_init correctly
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 8f8cd6c0a43ed637e620bbe45a8d0e0c2f4d5130
Author: Changbin Du <changbin.du@huawei.com>
Date:   Tue Feb 27 10:35:46 2024 +0800

    modules: wait do_free_init correctly

    The synchronization here is to ensure the ordering of freeing of a module
    init so that it happens before W+X checking.  It is worth noting it is not
    that the freeing was not happening, it is just that our sanity checkers
    raced against the permission checkers which assume init memory is already
    gone.

    Commit 1a7b7d9220 ("modules: Use vmalloc special flag") moved calling
    do_free_init() into a global workqueue instead of relying on it being
    called through call_rcu(..., do_free_init), which used to allowed us call
    do_free_init() asynchronously after the end of a subsequent grace period.
    The move to a global workqueue broke the gaurantees for code which needed
    to be sure the do_free_init() would complete with rcu_barrier().  To fix
    this callers which used to rely on rcu_barrier() must now instead use
    flush_work(&init_free_wq).

    Without this fix, we still could encounter false positive reports in W+X
    checking since the rcu_barrier() here can not ensure the ordering now.

    Even worse, the rcu_barrier() can introduce significant delay.  Eric
    Chanudet reported that the rcu_barrier introduces ~0.1s delay on a
    PREEMPT_RT kernel.

      [    0.291444] Freeing unused kernel memory: 5568K
      [    0.402442] Run /sbin/init as init process

    With this fix, the above delay can be eliminated.

    Link: https://lkml.kernel.org/r/20240227023546.2490667-1-changbin.du@huawei.com
    Fixes: 1a7b7d9220 ("modules: Use vmalloc special flag")
    Signed-off-by: Changbin Du <changbin.du@huawei.com>
    Tested-by: Eric Chanudet <echanude@redhat.com>
    Acked-by: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Xiaoyi Su <suxiaoyi@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:30 -04:00
Donald Dutile 7aa2fa676f Subject: revert of revert KEYS: Make use of platform keyring for module signature verify
Put back the RHEL-only module-signing patch so distinguishable in RHEL
after move of kernel/module-signing.c to kernel/module/signing.c .

JIRA: https://issues.redhat.com/browse/RHEL-28063
Upstream Status: RHEL-only

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:30 -04:00
Donald Dutile e6f4187276 module: Remove redundant TASK_UNINTERRUPTIBLE
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit f17f2c13d613cbeef529b03ca17ae2581b2e6cb8
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Dec 8 16:29:34 2023 +0800

    module: Remove redundant TASK_UNINTERRUPTIBLE

    TASK_KILLABLE already includes TASK_UNINTERRUPTIBLE, so there is no
    need to add a separate TASK_UNINTERRUPTIBLE.

    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile 573fa8ea71 module/decompress: use kvmalloc() consistently
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 17fc8084aa8f9d5235f252fc3978db657dd77e92
Author: Andrea Righi <andrea.righi@canonical.com>
Date:   Thu Nov 2 09:19:14 2023 +0100

    module/decompress: use kvmalloc() consistently

    We consistently switched from kmalloc() to vmalloc() in module
    decompression to prevent potential memory allocation failures with large
    modules, however vmalloc() is not as memory-efficient and fast as
    kmalloc().

    Since we don't know in general the size of the workspace required by the
    decompression algorithm, it is more reasonable to use kvmalloc()
    consistently, also considering that we don't have special memory
    requirements here.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Tested-by: Andrea Righi <andrea.righi@canonical.com>
    Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile 5806df5e42 module: Annotate struct module_notes_attrs with __counted_by
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit ea0b0bcef4917a2640ecc100c768b8e785784834
Author: Kees Cook <keescook@chromium.org>
Date:   Fri Sep 22 10:52:53 2023 -0700

    module: Annotate struct module_notes_attrs with __counted_by

    Prepare for the coming implementation by GCC and Clang of the __counted_by
    attribute. Flexible array members annotated with __counted_by can have
    their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
    (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
    functions).

    As found with Coccinelle[1], add __counted_by for struct module_notes_attrs.

    [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: linux-modules@vger.kernel.org
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile 4f912b873e module: Fix comment typo
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit fd06da776130ec2611c30272a0868f6a54cdf9d2
Author: Zhu Mao <zhumao001@208suo.com>
Date:   Wed Sep 20 17:13:09 2023 -0700

    module: Fix comment typo

    Delete duplicated word in comment.

    Signed-off-by: Zhu Mao <zhumao001@208suo.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile 64d79afe53 module/decompress: use vmalloc() for gzip decompression workspace
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 3737df782c740b944912ed93420c57344b1cf864
Author: Andrea Righi <andrea.righi@canonical.com>
Date:   Wed Aug 30 17:58:20 2023 +0200

    module/decompress: use vmalloc() for gzip decompression workspace

    Use a similar approach as commit a419beac4a07 ("module/decompress: use
    vmalloc() for zstd decompression workspace") and replace kmalloc() with
    vmalloc() also for the gzip module decompression workspace.

    In this case the workspace is represented by struct inflate_workspace
    that can be fairly large for kmalloc() and it can potentially lead to
    allocation errors on certain systems:

    $ pahole inflate_workspace
    struct inflate_workspace {
            struct inflate_state       inflate_state;        /*     0  9544 */
            /* --- cacheline 149 boundary (9536 bytes) was 8 bytes ago --- */
            unsigned char              working_window[32768]; /*  9544 32768 */

            /* size: 42312, cachelines: 662, members: 2 */
            /* last cacheline: 8 bytes */
    };

    Considering that there is no need to use continuous physical memory,
    simply switch to vmalloc() to provide a more reliable in-kernel module
    decompression.

    Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
    Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile ca8a2f786d module/decompress: use vmalloc() for zstd decompression workspace
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit a419beac4a070aff63c520f36ebf7cb8a76a8ae5
Author: Andrea Righi <andrea.righi@canonical.com>
Date:   Tue Aug 29 14:05:08 2023 +0200

    module/decompress: use vmalloc() for zstd decompression workspace

    Using kmalloc() to allocate the decompression workspace for zstd may
    trigger the following warning when large modules are loaded (i.e., xfs):

    [    2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350
    ...
    [    2.989033] Call Trace:
    [    2.989841]  <TASK>
    [    2.990614]  ? show_regs+0x6d/0x80
    [    2.991573]  ? __warn+0x89/0x160
    [    2.992485]  ? __alloc_pages+0x2c3/0x350
    [    2.993520]  ? report_bug+0x17e/0x1b0
    [    2.994506]  ? handle_bug+0x51/0xa0
    [    2.995474]  ? exc_invalid_op+0x18/0x80
    [    2.996469]  ? asm_exc_invalid_op+0x1b/0x20
    [    2.997530]  ? module_zstd_decompress+0xdc/0x2a0
    [    2.998665]  ? __alloc_pages+0x2c3/0x350
    [    2.999695]  ? module_zstd_decompress+0xdc/0x2a0
    [    3.000821]  __kmalloc_large_node+0x7a/0x150
    [    3.001920]  __kmalloc+0xdb/0x170
    [    3.002824]  module_zstd_decompress+0xdc/0x2a0
    [    3.003857]  module_decompress+0x37/0xc0
    [    3.004688]  init_module_from_file+0xd0/0x100
    [    3.005668]  idempotent_init_module+0x11c/0x2b0
    [    3.006632]  __x64_sys_finit_module+0x64/0xd0
    [    3.007568]  do_syscall_64+0x59/0x90
    [    3.008373]  ? ksys_read+0x73/0x100
    [    3.009395]  ? exit_to_user_mode_prepare+0x30/0xb0
    [    3.010531]  ? syscall_exit_to_user_mode+0x37/0x60
    [    3.011662]  ? do_syscall_64+0x68/0x90
    [    3.012511]  ? do_syscall_64+0x68/0x90
    [    3.013364]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

    However, continuous physical memory does not seem to be required in
    module_zstd_decompress(), so use vmalloc() instead, to prevent the
    warning and avoid potential failures at loading compressed modules.

    Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression")
    Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile b17d15cce3 module: Expose module_init_layout_section()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 2abcc4b5a64a65a2d2287ba0be5c2871c1552416
Author: James Morse <james.morse@arm.com>
Date:   Tue Aug 1 14:54:07 2023 +0000

    module: Expose module_init_layout_section()

    module_init_layout_section() choses whether the core module loader
    considers a section as init or not. This affects the placement of the
    exit section when module unloading is disabled. This code will never run,
    so it can be free()d once the module has been initialised.

    arm and arm64 need to count the number of PLTs they need before applying
    relocations based on the section name. The init PLTs are stored separately
    so they can be free()d. arm and arm64 both use within_module_init() to
    decide which list of PLTs to use when applying the relocation.

    Because within_module_init()'s behaviour changes when module unloading
    is disabled, both architecture would need to take this into account when
    counting the PLTs.

    Today neither architecture does this, meaning when module unloading is
    disabled there are insufficient PLTs in the init section to load some
    modules, resulting in warnings:
    | WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
    | Modules linked in: crct10dif_common
    | CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
    | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
    | pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    | pc : module_emit_plt_entry+0x184/0x1cc
    | lr : module_emit_plt_entry+0x94/0x1cc
    | sp : ffffffc0803bba60
    [...]
    | Call trace:
    |  module_emit_plt_entry+0x184/0x1cc
    |  apply_relocate_add+0x2bc/0x8e4
    |  load_module+0xe34/0x1bd4
    |  init_module_from_file+0x84/0xc0
    |  __arm64_sys_finit_module+0x1b8/0x27c
    |  invoke_syscall.constprop.0+0x5c/0x104
    |  do_el0_svc+0x58/0x160
    |  el0_svc+0x38/0x110
    |  el0t_64_sync_handler+0xc0/0xc4
    |  el0t_64_sync+0x190/0x194

    Instead of duplicating module_init_layout_section()s logic, expose it.

    Reported-by: Adam Johnston <adam.johnston@arm.com>
    Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
    Cc: stable@vger.kernel.org
    Signed-off-by: James Morse <james.morse@arm.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:29 -04:00
Donald Dutile cf7679e379 modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit ff09f6fd297293175eaa0ed492495e36b3eb1a8e
Author: Palmer Dabbelt <palmer@rivosinc.com>
Date:   Fri Jul 21 08:01:48 2023 -0700

    modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols

    Trying to restrict the '$'-prefix change to RISC-V caused some fallout,
    so let's just treat all those symbols as special.

    Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too")
    Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile 346c4d39ff module: Ignore RISC-V mapping symbols too
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit c05780ef3c190c2dafbf0be8e65d4f01103ad577
Author: Palmer Dabbelt <palmer@rivosinc.com>
Date:   Fri Jul 7 09:00:51 2023 -0700

    module: Ignore RISC-V mapping symbols too

    RISC-V has an extended form of mapping symbols that we use to encode
    the ISA when it changes in the middle of an ELF.  This trips up modpost
    as a build failure, I haven't yet verified it yet but I believe the
    kallsyms difference should result in stacks looking sane again.

    Reported-by: Randy Dunlap <rdunlap@infradead.org>
    Closes: https://lore.kernel.org/all/9d9e2902-5489-4bf0-d9cb-556c8e5d71c2@infradead.org/
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
    Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile 70f40ed8e2 module: fix init_module_from_file() error handling
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit f1962207150c8b602e980616f04b37ea4e64bb9f
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jul 4 06:37:32 2023 -0700

    module: fix init_module_from_file() error handling

    Vegard Nossum pointed out two different problems with the error handling
    in init_module_from_file():

     (a) the idempotent loading code didn't clean up properly in some error
         cases, leaving the on-stack 'struct idempotent' element still in
         the hash table

     (b) failure to read the module file would nonsensically update the
         'invalid_kread_bytes' stat counter with the error value

    The first error is quite nasty, in that it can then cause subsequent
    idempotent loads of that same file to access stale stack contents of the
    previous failure.  The case may not happen in any normal situation
    (explaining all the "Tested-by's on the original change), and requires
    admin privileges, but syzkaller triggers random bad behavior as a
    result:

        BUG: soft lockup in sys_finit_module
        BUG: unable to handle kernel paging request in init_module_from_file
        general protection fault in init_module_from_file
        INFO: task hung in init_module_from_file
        KASAN: out-of-bounds Read in init_module_from_file
        KASAN: slab-out-of-bounds Read in init_module_from_file
        ...

    The second error is fairly benign and just leads to nonsensical stats
    (and has been around since the debug stats were added).

    Vegard also provided a patch for the idempotent loading issue, but I'd
    rather re-organize the code and make it more legible using another level
    of helper functions than add the usual "goto out" error handling.

    Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
    Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent")
    Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
    Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile d8420b4b83 modules: catch concurrent module loads, treat them as idempotent
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 9b9879fc03275ffe0da328cf5b864d9e694167c8
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon May 29 21:39:51 2023 -0400

    modules: catch concurrent module loads, treat them as idempotent

    This is the new-and-improved attempt at avoiding huge memory load spikes
    when the user space boot sequence tries to load hundreds (or even
    thousands) of redundant duplicate modules in parallel.

    See commit 9828ed3f695a ("module: error out early on concurrent load of
    the same module file") for background and an earlier failed attempt that
    was reverted.

    That earlier attempt just said "concurrently loading the same module is
    silly, just open the module file exclusively and return -ETXTBSY if
    somebody else is already loading it".

    While it is true that concurrent module loads of the same module is
    silly, the reason that earlier attempt then failed was that the
    concurrently loaded module would often be a prerequisite for another
    module.

    Thus failing to load the prerequisite would then cause cascading
    failures of the other modules, rather than just short-circuiting that
    one unnecessary module load.

    At the same time, we still really don't want to load the contents of the
    same module file hundreds of times, only to then wait for an eventually
    successful load, and have everybody else return -EEXIST.

    As a result, this takes another approach, and treats concurrent module
    loads from the same file as "idempotent" in the inode.  So if one module
    load is ongoing, we don't start a new one, but instead just wait for the
    first one to complete and return the same return value as it did.

    So unlike the first attempt, this does not return early: the intent is
    not to speed up the boot, but to avoid a thundering herd problem in
    allocating memory (both physical and virtual) for a module more than
    once.

    Also note that this does change behavior: it used to be that when you
    had concurrent loads, you'd have one "winner" that would return success,
    and everybody else would return -EEXIST.

    In contrast, this idempotent logic goes all Oprah on the problem, and
    says "You are a winner! And you are a winner! We are ALL winners".  But
    since there's no possible actual real semantic difference between "you
    loaded the module" and "somebody else already loaded the module", this
    is more of a feel-good change than an actual honest-to-goodness semantic
    change.

    Of course, any true Johnny-come-latelies that don't get caught in the
    concurrency filter will still return -EEXIST.  It's no different from
    not even getting a seat at an Oprah taping.  That's life.

    See the long thread on the kernel mailing list about this all, which
    includes some numbers for memory use before and after the patch.

    Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/
    Reviewed-by: Johan Hovold <johan@kernel.org>
    Tested-by: Johan Hovold <johan@kernel.org>
    Tested-by: Luis Chamberlain <mcgrof@kernel.org>
    Tested-by: Dan Williams <dan.j.williams@intel.com>
    Tested-by: Rudi Heitbaum <rudi@heitbaum..com>
    Tested-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile 9e9e6cbdd2 module: split up 'finit_module()' into init_module_from_file() helper
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 054a73009c22a5fb8bbeee5394980809276bc9fe
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon May 29 20:55:13 2023 -0400

    module: split up 'finit_module()' into init_module_from_file() helper

    This will simplify the next step, where we can then key off the inode to
    do one idempotent module load.

    Let's do the obvious re-organization in one step, and then the new code
    in another.

    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile efc5790fc4 kbuild: generate KSYMTAB entries by modpost
JIRA: https://issues.redhat.com/browse/RHEL-28063

Conflicts:
 (1) Dropped patches for check-local-export; that script was temporarily
     replacing modpost, but abanadoned and modpost resumed with simpler
     addition and-or bug fixes, so skip it here.
 (2) Drop ia64 patches since RHEL doesn't support ia64, and didn't apply cleanly.
 (3) Made cmd_gensymversions genksyms exec same as cmd_gensymtypes;
     cmd_gensymversions appears to be a rhel-ism, and it has no callers/users
     under script hierarchy.

commit ddb5cdbafaaad6b99d7007ae1740403124502d03
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Mon Jun 12 00:50:52 2023 +0900

    kbuild: generate KSYMTAB entries by modpost

    Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing
    CONFIG_MODULE_REL_CRCS") made modpost output CRCs in the same way
    whether the EXPORT_SYMBOL() is placed in *.c or *.S.

    For further cleanups, this commit applies a similar approach to the
    entire data structure of EXPORT_SYMBOL().

    The EXPORT_SYMBOL() compilation is split into two stages.

    When a source file is compiled, EXPORT_SYMBOL() will be converted into
    a dummy symbol in the .export_symbol section.

    For example,

        EXPORT_SYMBOL(foo);
        EXPORT_SYMBOL_NS_GPL(bar, BAR_NAMESPACE);

    will be encoded into the following assembly code:

        .section ".export_symbol","a"
        __export_symbol_foo:
                .asciz ""                      /* license */
                .asciz ""                      /* name space */
                .balign 8
                .quad foo                      /* symbol reference */
        .previous

        .section ".export_symbol","a"
        __export_symbol_bar:
                .asciz "GPL"                   /* license */
                .asciz "BAR_NAMESPACE"         /* name space */
                .balign 8
                .quad bar                      /* symbol reference */
        .previous

    They are mere markers to tell modpost the name, license, and namespace
    of the symbols. They will be dropped from the final vmlinux and modules
    because the *(.export_symbol) will go into /DISCARD/ in the linker script.

    Then, modpost extracts all the information about EXPORT_SYMBOL() from the
    .export_symbol section, and generates the final C code:

        KSYMTAB_FUNC(foo, "", "");
        KSYMTAB_FUNC(bar, "_gpl", "BAR_NAMESPACE");

    KSYMTAB_FUNC() (or KSYMTAB_DATA() if it is data) is expanded to struct
    kernel_symbol that will be linked to the vmlinux or a module.

    With this change, EXPORT_SYMBOL() works in the same way for *.c and *.S
    files, providing the following benefits.

    [1] Deprecate EXPORT_DATA_SYMBOL()

    In the old days, EXPORT_SYMBOL() was only available in C files. To export
    a symbol in *.S, EXPORT_SYMBOL() was placed in a separate *.c file.
    arch/arm/kernel/armksyms.c is one example written in the classic manner.

    Commit 22823ab419 ("EXPORT_SYMBOL() for asm") removed this limitation.
    Since then, EXPORT_SYMBOL() can be placed close to the symbol definition
    in *.S files. It was a nice improvement.

    However, as that commit mentioned, you need to use EXPORT_DATA_SYMBOL()
    for data objects on some architectures.

    In the new approach, modpost checks symbol's type (STT_FUNC or not),
    and outputs KSYMTAB_FUNC() or KSYMTAB_DATA() accordingly.

    There are only two users of EXPORT_DATA_SYMBOL:

      EXPORT_DATA_SYMBOL_GPL(empty_zero_page)    (arch/ia64/kernel/head.S)
      EXPORT_DATA_SYMBOL(ia64_ivt)               (arch/ia64/kernel/ivt.S)

    They are transformed as follows and output into .vmlinux.export.c

      KSYMTAB_DATA(empty_zero_page, "_gpl", "");
      KSYMTAB_DATA(ia64_ivt, "", "");

    The other EXPORT_SYMBOL users in ia64 assembly are output as
    KSYMTAB_FUNC().

    EXPORT_DATA_SYMBOL() is now deprecated.

    [2] merge <linux/export.h> and <asm-generic/export.h>

    There are two similar header implementations:

      include/linux/export.h        for .c files
      include/asm-generic/export.h  for .S files

    Ideally, the functionality should be consistent between them, but they
    tend to diverge.

    Commit 8651ec01da ("module: add support for symbol namespaces.") did
    not support the namespace for *.S files.

    This commit shifts the essential implementation part to C, which supports
    EXPORT_SYMBOL_NS() for *.S files.

    <asm/export.h> and <asm-generic/export.h> will remain as a wrapper of
    <linux/export.h> for a while.

    They will be removed after #include <asm/export.h> directives are all
    replaced with #include <linux/export.h>.

    [3] Implement CONFIG_TRIM_UNUSED_KSYMS in one-pass algorithm (by a later commit)

    When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses
    the directory tree to determine which EXPORT_SYMBOL to trim. If an
    EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the
    second traverse, where some source files are recompiled with their
    EXPORT_SYMBOL() tuned into a no-op.

    We can do this better now; modpost can selectively emit KSYMTAB entries
    that are really used by modules.

    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:28 -04:00
Donald Dutile a5ba435c43 module/decompress: Fix error checking on zstd decompression
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit fadb74f9f2f609238070c7ca1b04933dc9400e4a
Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Thu Jun 1 14:23:31 2023 -0700

    module/decompress: Fix error checking on zstd decompression

    While implementing support for in-kernel decompression in kmod,
    finit_module() was returning a very suspicious value:

            finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296

    It turns out the check for module_get_next_page() failing is wrong,
    and hence the decompression was not really taking place. Invert
    the condition to fix it.

    Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression")
    Cc: stable@kernel.org
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Cc: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:27 -04:00
Donald Dutile a2e3e61a15 module: fix module load for ia64
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit db3e33dd8bd956f165436afdbdbf1c653fb3c8e6
Author: Song Liu <song@kernel.org>
Date:   Sun May 28 16:00:41 2023 -0700

    module: fix module load for ia64

    Frank reported boot regression in ia64 as:

    ELILO v3.16 for EFI/IA-64
    ..
    Uncompressing Linux... done
    Loading file AC100221.initrd.img...done
    [    0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc
    (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20
    CEST 2023
    [    0.000000] efi: EFI v1.1 by HP
    [    0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000
    ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000
    [    0.000000] PCDP: v3 at 0x3fe28000
    [    0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options
    '9600n8')
    [    0.000000] printk: bootconsole [uart8250] enabled
    [    0.000000] ACPI: Early table checksum verification disabled
    [    0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP    )
    [    0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP     rx2620
    00000000 HP   00000000)
    [...]
    [    3.793350] Run /init as init process
    Loading, please wait...
    Starting systemd-udevd version 252.6-1
    [    3.951100] ------------[ cut here ]------------
    [    3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547
    __layout_sections+0x370/0x3c0
    [    3.949512] Unable to handle kernel paging request at virtual address
    1000000000000000
    [    3.951100] Modules linked in:
    [    3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1
    [    3.956161] (udev-worker)[142]: Oops 11003706212352 [1]
    [    3.951774] Hardware name: hp server rx2620                   , BIOS
    04.29
    11/30/2007
    [    3.951774]
    [    3.951774] Call Trace:
    [    3.958339] Unable to handle kernel paging request at virtual address
    1000000000000000
    [    3.956161] Modules linked in:
    [    3.951774]  [<a0000001000156d0>] show_stack.part.0+0x30/0x60
    [    3.951774]                                 sp=e000000183a67b20
    bsp=e000000183a61628
    [    3.956161]
    [    3.956161]

    which bisect to module_memory change [1].

    Debug showed that ia64 uses some special sections:

    __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID
    __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID
    __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID

    All these sections are loaded to module core memory before [1].

    Fix ia64 boot by loading these sections to MOD_DATA (core rw data).

    [1] commit ac3b43283923 ("module: replace module_layout with module_memory")

    Fixes: ac3b43283923 ("module: replace module_layout with module_memory")
    Reported-by: Frank Scheiner <frank.scheiner@web.de>
    Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html
    Closes: https://marc.info/?l=linux-ia64&m=168509859125505
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Song Liu <song@kernel.org>
    Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:27 -04:00
Donald Dutile 31b1bf449a kallsyms: remove unsed API lookup_symbol_attrs
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 4f521bab5bfc854ec0dab7ef560dfa75247e615d
Author: Maninder Singh <maninder1.s@samsung.com>
Date:   Fri May 26 12:51:23 2023 +0530

    kallsyms: remove unsed API lookup_symbol_attrs

    with commit '7878c231dae0 ("slab: remove /proc/slab_allocators")'
    lookup_symbol_attrs usage is removed.

    Thus removing redundant API.

    Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:26 -04:00
Donald Dutile b30cf49e7c module: Remove preempt_disable() from module reference counting.
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit cb0b50b813f6198b7d44ae8e169803440333577a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue May 9 15:49:02 2023 +0200

    module: Remove preempt_disable() from module reference counting.

    The preempt_disable() section in module_put() was added in commit
       e1783a240f ("module: Use this_cpu_xx to dynamically allocate counters")

    while the per-CPU counter were switched to another API. The API requires
    that during the RMW operation the CPU remained the same.

    This counting API was later replaced with atomic_t in commit
       2f35c41f58 ("module: Replace module_ref with atomic_t refcnt")

    Since this atomic_t replacement there is no need to keep preemption
    disabled while the reference counter is modified.

    Remove preempt_disable() from module_put(), __module_get() and
    try_module_get().

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:26 -04:00
Donald Dutile c19ad53194 module: Fix use-after-free bug in read_file_mod_stats()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit d36f6efbe0cb422fe1e4475717d75f3737088832
Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date:   Thu Apr 27 22:59:33 2023 -0700

    module: Fix use-after-free bug in read_file_mod_stats()

    Smatch warns:
            kernel/module/stats.c:394 read_file_mod_stats()
            warn: passing freed memory 'buf'

    We are passing 'buf' to simple_read_from_buffer() after freeing it.

    Fix this by changing the order of 'simple_read_from_buffer' and 'kfree'.

    Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
    Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:26 -04:00
Donald Dutile df6febf734 module: include internal.h in module/dups.c
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 0b891c83d8c54cb70e186456c2191adb5fd98c56
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Sat Apr 29 22:36:04 2023 +0200

    module: include internal.h in module/dups.c

    Two newly introduced functions are declared in a header that is not
    included before the definition, causing a warning with sparse or
    'make W=1':

    kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes]
      118 | bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
          |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
    kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes]
      220 | void kmod_dup_request_announce(char *module_name, int ret)
          |      ^~~~~~~~~~~~~~~~~~~~~~~~~

    Add an explicit include to ensure the prototypes match.

    Fixes: 8660484ed1cf ("module: add debugging auto-load duplicate module support")
    Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lore.kernel.org/oe-kbuild-all/202304141440.DYO4NAzp-lkp@intel.com/
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:26 -04:00
Donald Dutile a27f75beb4 module: add debugging auto-load duplicate module support
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 8660484ed1cf3261e89e0bad94c6395597e87599
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Thu Apr 13 22:28:39 2023 -0700

    module: add debugging auto-load duplicate module support

    The finit_module() system call can in the worst case use up to more than
    twice of a module's size in virtual memory. Duplicate finit_module()
    system calls are non fatal, however they unnecessarily strain virtual
    memory during bootup and in the worst case can cause a system to fail
    to boot. This is only known to currently be an issue on systems with
    larger number of CPUs.

    To help debug this situation we need to consider the different sources for
    finit_module(). Requests from the kernel that rely on module auto-loading,
    ie, the kernel's *request_module() API, are one source of calls. Although
    modprobe checks to see if a module is already loaded prior to calling
    finit_module() there is a small race possible allowing userspace to
    trigger multiple modprobe calls racing against modprobe and this not
    seeing the module yet loaded.

    This adds debugging support to the kernel module auto-loader (*request_module()
    calls) to easily detect duplicate module requests. To aid with possible bootup
    failure issues incurred by this, it will converge duplicates requests to a
    single request. This avoids any possible strain on virtual memory during
    bootup which could be incurred by duplicate module autoloading requests.

    Folks debugging virtual memory abuse on bootup can and should enable
    this to see what pr_warn()s come on, to see if module auto-loading is to
    blame for their wores. If they see duplicates they can further debug this
    by enabling the module.enable_dups_trace kernel parameter or by enabling
    CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE.

    Current evidence seems to point to only a few duplicates for module
    auto-loading. And so the source for other duplicates creating heavy
    virtual memory pressure due to larger number of CPUs should becoming
    from another place (likely udev).

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:26 -04:00
Donald Dutile c35bdd4008 module: stats: fix invalid_mod_bytes typo
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit a81b1fc8ea639e03326c1d0dcde041986bc11500
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Apr 18 09:17:51 2023 +0200

    module: stats: fix invalid_mod_bytes typo

    This was caught by randconfig builds but does not show up in
    build testing without CONFIG_MODULE_DECOMPRESS:

    kernel/module/stats.c: In function 'mod_stat_bump_invalid':
    kernel/module/stats.c:229:42: error: 'invalid_mod_byte' undeclared (first use in this function); did you mean 'invalid_mod_bytes'?
      229 |   atomic_long_add(info->compressed_len, &invalid_mod_byte);
          |                                          ^~~~~~~~~~~~~~~~
          |                                          invalid_mod_bytes

    Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Randy Dunlap <rdunlap@infradead.org>
    Tested-by: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 1a4227bc49 module: remove use of uninitialized variable len
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 9f5cab173e19201eebeaca853ff664a9a269fed0
Author: Tom Rix <trix@redhat.com>
Date:   Mon Apr 17 19:09:57 2023 -0400

    module: remove use of uninitialized variable len

    clang build reports
    kernel/module/stats.c:307:34: error: variable
      'len' is uninitialized when used here [-Werror,-Wuninitialized]
            len = scnprintf(buf + 0, size - len,
                                            ^~~
    At the start of this sequence, neither the '+ 0', nor the '- len' are needed.
    So remove them and fix using 'len' uninitalized.

    Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
    Signed-off-by: Tom Rix <trix@redhat.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 9bfdfec27a module: fix building stats for 32-bit targets
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 719ccd803ed5bd1ad92b0b46fc095b8fe266827e
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Apr 18 00:48:04 2023 +0200

    module: fix building stats for 32-bit targets

    The new module statistics code mixes 64-bit types and wordsized 'long'
    variables, which leads to build failures on 32-bit architectures:

    kernel/module/stats.c: In function 'read_file_mod_stats':
    kernel/module/stats.c:291:29: error: passing argument 1 of 'atomic64_read' from incompatible pointer type [-Werror=incompatible-pointer-types]
      291 |  total_size = atomic64_read(&total_mod_size);
    x86_64-linux-ld: kernel/module/stats.o: in function `read_file_mod_stats':
    stats.c:(.text+0x2b2): undefined reference to `__udivdi3'

    To fix this, the code has to use one of the two types consistently.

    Change them all to word-size types here.

    Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile c9be6e3679 module: stats: include uapi/linux/module.h
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 635dc38314c75c5727711d896d4c71ec92f6f20b
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Apr 18 00:02:46 2023 +0200

    module: stats: include uapi/linux/module.h

    MODULE_INIT_COMPRESSED_FILE is defined in the uapi header, which
    is not included indirectly from the normal linux/module.h, but
    has to be pulled in explicitly:

    kernel/module/stats.c: In function 'mod_stat_bump_invalid':
    kernel/module/stats.c:227:14: error: 'MODULE_INIT_COMPRESSED_FILE' undeclared (first use in this function)
      227 |  if (flags & MODULE_INIT_COMPRESSED_FILE)
          |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 2988d369f3 module: avoid allocation if module is already present and ready
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 064f4536d13939b6e8cdb71298ff5d657f4f8caa
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Fri Mar 10 20:48:03 2023 -0800

    module: avoid allocation if module is already present and ready

    The finit_module() system call can create unnecessary virtual memory
    pressure for duplicate modules. This is because load_module() can in
    the worse case allocate more than twice the size of a module in virtual
    memory. This saves at least a full size of the module in wasted vmalloc
    space memory by trying to avoid duplicates as soon as we can validate
    the module name in the read module structure.

    This can only be an issue if a system is getting hammered with userspace
    loading modules. There are two ways to load modules typically on systems,
    one is the kernel moduile auto-loading (*request_module*() calls in-kernel)
    and the other is things like udev. The auto-loading is in-kernel, but that
    pings back to userspace to just call modprobe. We already have a way to
    restrict the amount of concurrent kernel auto-loads in a given time, however
    that still allows multiple requests for the same module to go through
    and force two threads in userspace racing to call modprobe for the same
    exact module. Even though libkmod which both modprobe and udev does check
    if a module is already loaded prior calling finit_module() races are
    still possible and this is clearly evident today when you have multiple
    CPUs.

    To avoid memory pressure for such stupid cases put a stop gap for them.
    The *earliest* we can detect duplicates from the modules side of things
    is once we have blessed the module name, sadly after the first vmalloc
    allocation. We can check for the module being present *before* a secondary
    vmalloc() allocation.

    There is a linear relationship between wasted virtual memory bytes and
    the number of CPU counts. The reason is that udev ends up racing to call
    tons of the same modules for each of the CPUs.

    We can see the different linear relationships between wasted virtual
    memory and CPU count during after boot in the following graph:

             +----------------------------------------------------------------------------+
        14GB |-+          +            +            +           +           *+          +-|
             |                                                          ****              |
             |                                                       ***                  |
             |                                                     **                     |
        12GB |-+                                                 **                     +-|
             |                                                 **                         |
             |                                               **                           |
             |                                             **                             |
             |                                           **                               |
        10GB |-+                                       **                               +-|
             |                                       **                                   |
             |                                     **                                     |
             |                                   **                                       |
         8GB |-+                               **                                       +-|
    waste    |                               **                             ###           |
             |                             **                           ####              |
             |                           **                      #######                  |
         6GB |-+                     ****                    ####                       +-|
             |                      *                    ####                             |
             |                     *                 ####                                 |
             |                *****              ####                                     |
         4GB |-+            **               ####                                       +-|
             |            **             ####                                             |
             |          **           ####                                                 |
             |        **         ####                                                     |
         2GB |-+    **      #####                                                       +-|
             |     *    ####                                                              |
             |    * ####                                                   Before ******* |
             |  **##      +            +            +           +           After ####### |
             +----------------------------------------------------------------------------+
             0            50          100          150         200          250          300
                                              CPUs count

    On the y-axis we can see gigabytes of wasted virtual memory during boot
    due to duplicate module requests which just end up failing. Trying to
    infer the slope this ends up being about ~463 MiB per CPU lost prior
    to this patch. After this patch we only loose about ~230 MiB per CPU, for
    a total savings of about ~233 MiB per CPU. This is all *just on bootup*!

    On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests
    kmod.sh -t 0008 I see a saving in the *highest* side of memory
    consumption of up to ~ 84 MiB with the Linux kernel selftests kmod
    test 0008. With the new stress-ng module test I see a 145 MiB difference
    in max memory consumption with 100 ops. The stress-ng module ops tests can be
    pretty pathalogical -- it is not realistic, however it was used to
    finally successfully reproduce issues which are only reported to happen on
    system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM
    system. Running out of virtual memory space is no surprise given the
    above graph, since at least on x86_64 we're capped at 128 MiB, eventually
    we'd hit a series of errors and once can use the above graph to
    guestimate when. This of course will vary depending on the features
    you have enabled. So for instance, enabling KASAN seems to make this
    much worse.

    The results with kmod and stress-ng can be observed and visualized below.
    The time it takes to run the test is also not affected.

    The kmod tests 0008:

    The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib)
    given the tests peak around that range.

    cat kmod.plot
    set term dumb
    set output fileout
    set yrange [400000:580000]
    plot filein with linespoints title "Memory usage (KiB)"

    Before:
    root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
    root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C
    root@kmod ~ # sort -n -r log-0008-before.txt | head -1
    528732

    So ~516.33 MiB

    After:

    root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
    root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C

    root@kmod ~ # sort -n -r log-0008-after.txt | head -1
    442516

    So ~432.14 MiB

    That's about 84 ~MiB in savings in the worst case. The graphs:

    root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot
    root@kmod ~ # gnuplot -e "filein='log-0008-after.txt';  fileout='graph-0008-after.txt'"  kmod.plot

    root@kmod ~ # cat graph-0008-before.txt

      580000 +-----------------------------------------------------------------+
             |       +        +       +       +       +        +       +       |
      560000 |-+                                    Memory usage (KiB) ***A***-|
             |                                                                 |
      540000 |-+                                                             +-|
             |                                                                 |
             |        *A     *AA*AA*A*AA          *A*AA    A*A*A *AA*A*AA*A  A |
      520000 |-+A*A*AA  *AA*A           *A*AA*A*AA     *A*A     A          *A+-|
             |*A                                                               |
      500000 |-+                                                             +-|
             |                                                                 |
      480000 |-+                                                             +-|
             |                                                                 |
      460000 |-+                                                             +-|
             |                                                                 |
             |                                                                 |
      440000 |-+                                                             +-|
             |                                                                 |
      420000 |-+                                                             +-|
             |       +        +       +       +       +        +       +       |
      400000 +-----------------------------------------------------------------+
             0       5        10      15      20      25       30      35      40

    root@kmod ~ # cat graph-0008-after.txt

      580000 +-----------------------------------------------------------------+
             |       +        +       +       +       +        +       +       |
      560000 |-+                                    Memory usage (KiB) ***A***-|
             |                                                                 |
      540000 |-+                                                             +-|
             |                                                                 |
             |                                                                 |
      520000 |-+                                                             +-|
             |                                                                 |
      500000 |-+                                                             +-|
             |                                                                 |
      480000 |-+                                                             +-|
             |                                                                 |
      460000 |-+                                                             +-|
             |                                                                 |
             |          *A              *A*A                                   |
      440000 |-+A*A*AA*A  A       A*A*AA    A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-|
             |*A           *A*AA*A                                             |
      420000 |-+                                                             +-|
             |       +        +       +       +       +        +       +       |
      400000 +-----------------------------------------------------------------+
             0       5        10      15      20      25       30      35      40

    The stress-ng module tests:

    This is used to run the test to try to reproduce the vmap issues
    reported by David:

      echo 0 > /proc/sys/vm/oom_dump_tasks
      ./stress-ng --module 100 --module-name xfs

    Prior to this commit:
    root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt
    root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1
    5046456

    After this commit:
    root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt
    root@kmod ~ # sort -n -r after-stress-ng.txt | head -1
    4896972

    5046456 - 4896972
    149484
    149484/1024
    145.98046875000000000000

    So this commit using stress-ng reveals saving about 145 MiB in memory
    using 100 ops from stress-ng which reproduced the vmap issue reported.

    cat kmod.plot
    set term dumb
    set output fileout
    set yrange [4700000:5070000]
    plot filein with linespoints title "Memory usage (KiB)"

    root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'"  kmod-simple-stress-ng.plot
    root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'"  kmod-simple-stress-ng.plot

    root@kmod ~ # cat graph-stress-ng-before.txt

               +---------------------------------------------------------------+
      5.05e+06 |-+     + A     +       +       +       +       +       +     +-|
               |         *                          Memory usage (KiB) ***A*** |
               |         *                             A                       |
         5e+06 |-+      **                            **                     +-|
               |        **                            * *    A                 |
      4.95e+06 |-+      * *                          A  *   A*               +-|
               |        * *      A       A           *  *  *  *             A  |
               |       *  *     * *     * *        *A   *  *  *      A      *  |
       4.9e+06 |-+     *  *     * A*A   * A*AA*A  A      *A    **A   **A*A  *+-|
               |       A  A*A  A    *  A       *  *      A     A *  A    * **  |
               |      *      **      **         * *              *  *    * * * |
      4.85e+06 |-+   A       A       A          **               *  *     ** *-|
               |     *                           *               * *      ** * |
               |     *                           A               * *      *  * |
       4.8e+06 |-+   *                                           * *      A  A-|
               |     *                                           * *           |
      4.75e+06 |-+  *                                            * *         +-|
               |    *                                            **            |
               |    *  +       +       +       +       +       + **    +       |
       4.7e+06 +---------------------------------------------------------------+
               0       5       10      15      20      25      30      35      40

    root@kmod ~ # cat graph-stress-ng-after.txt

               +---------------------------------------------------------------+
      5.05e+06 |-+     +       +       +       +       +       +       +     +-|
               |                                    Memory usage (KiB) ***A*** |
               |                                                               |
         5e+06 |-+                                                           +-|
               |                                                               |
      4.95e+06 |-+                                                           +-|
               |                                                               |
               |                                                               |
       4.9e+06 |-+                                      *AA                  +-|
               |  A*AA*A*A  A  A*AA*AA*A*AA*A  A  A  A*A   *AA*A*A  A  A*AA*AA |
               |  *      * **  *            *  *  ** *            ***  *       |
      4.85e+06 |-+*       ***  *            * * * ***             A *  *     +-|
               |  *       A *  *             ** * * A               *  *       |
               |  *         *  *             *  **                  *  *       |
       4.8e+06 |-+*         *  *             A   *                  *  *     +-|
               | *          * *                  A                  * *        |
      4.75e+06 |-*          * *                                     * *      +-|
               | *          * *                                     * *        |
               | *     +    * *+       +       +       +       +    * *+       |
       4.7e+06 +---------------------------------------------------------------+
               0       5       10      15      20      25      30      35      40

    [0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com

    Reported-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile e41f7154cf module: add debug stats to help identify memory pressure
JIRA: https://issues.redhat.com/browse/RHEL-28063

Conflicts:
   Adding RHEL-only MODULE_STATS set to n for now. Possibly
   add in future -debug kernels, to be determined.
   Add rest of commit(s) to enable clean backport for further commits.

commit df3e764d8e5cd416efee29e0de3c93917dff5d33
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Tue Mar 28 20:03:19 2023 -0700

    module: add debug stats to help identify memory pressure

    Loading modules with finit_module() can end up using vmalloc(), vmap()
    and vmalloc() again, for a total of up to 3 separate allocations in the
    worst case for a single module. We always kernel_read*() the module,
    that's a vmalloc(). Then vmap() is used for the module decompression,
    and if so the last read buffer is freed as we use the now decompressed
    module buffer to stuff data into our copy module. The last allocation is
    specific to each architectures but pretty much that's generally a series
    of vmalloc() calls or a variation of vmalloc to handle ELF sections with
    special permissions.

    Evaluation with new stress-ng module support [1] with just 100 ops
    is proving that you can end up using GiBs of data easily even with all
    care we have in the kernel and userspace today in trying to not load modules
    which are already loaded. 100 ops seems to resemble the sort of pressure a
    system with about 400 CPUs can create on module loading. Although issues
    relating to duplicate module requests due to each CPU inucurring a new
    module reuest is silly and some of these are being fixed, we currently lack
    proper tooling to help diagnose easily what happened, when it happened
    and who likely is to blame -- userspace or kernel module autoloading.

    Provide an initial set of stats which use debugfs to let us easily scrape
    post-boot information about failed loads. This sort of information can
    be used on production worklaods to try to optimize *avoiding* redundant
    memory pressure using finit_module().

    There's a few examples that can be provided:

    A 255 vCPU system without the next patch in this series applied:

    Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s
    graphical.target reached after 6.988s in userspace

    And 13.58 GiB of virtual memory space lost due to failed module loading:

    root@big ~ # cat /sys/kernel/debug/modules/stats
             Mods ever loaded       67
         Mods failed on kread       0
    Mods failed on decompress       0
      Mods failed on becoming       0
          Mods failed on load       1411
            Total module size       11464704
          Total mod text size       4194304
           Failed kread bytes       0
      Failed decompress bytes       0
        Failed becoming bytes       0
            Failed kmod bytes       14588526272
     Virtual mem wasted bytes       14588526272
             Average mod size       171115
        Average mod text size       62602
      Average fail load bytes       10339140
    Duplicate failed modules:
                  module-name        How-many-times                    Reason
                    kvm_intel                   249                      Load
                          kvm                   249                      Load
                    irqbypass                     8                      Load
             crct10dif_pclmul                   128                      Load
          ghash_clmulni_intel                    27                      Load
                 sha512_ssse3                    50                      Load
               sha512_generic                   200                      Load
                  aesni_intel                   249                      Load
                  crypto_simd                    41                      Load
                       cryptd                   131                      Load
                        evdev                     2                      Load
                    serio_raw                     1                      Load
                   virtio_pci                     3                      Load
                         nvme                     3                      Load
                    nvme_core                     3                      Load
        virtio_pci_legacy_dev                     3                      Load
        virtio_pci_modern_dev                     3                      Load
                       t10_pi                     3                      Load
                       virtio                     3                      Load
                 crc32_pclmul                     6                      Load
               crc64_rocksoft                     3                      Load
                 crc32c_intel                    40                      Load
                  virtio_ring                     3                      Load
                        crc64                     3                      Load

    The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the
    next patch in this series applied, shows 226.53 MiB are wasted in virtual
    memory allocations which due to duplicate module requests during boot.
    It also shows an average module memory size of 167.10 KiB and an an
    average module .text + .init.text size of 61.13 KiB. The end shows all
    modules which were detected as duplicate requests and whether or not
    they failed early after just the first kernel_read*() call or late after
    we've already allocated the private space for the module in
    layout_and_allocate(). A system with module decompression would reveal
    more wasted virtual memory space.

    We should put effort now into identifying the source of these duplicate
    module requests and trimming these down as much possible. Larger systems
    will obviously show much more wasted virtual memory allocations.

    root@kmod ~ # cat /sys/kernel/debug/modules/stats
             Mods ever loaded       67
         Mods failed on kread       0
    Mods failed on decompress       0
      Mods failed on becoming       83
          Mods failed on load       16
            Total module size       11464704
          Total mod text size       4194304
           Failed kread bytes       0
      Failed decompress bytes       0
        Failed becoming bytes       228959096
            Failed kmod bytes       8578080
     Virtual mem wasted bytes       237537176
             Average mod size       171115
        Average mod text size       62602
      Avg fail becoming bytes       2758544
      Average fail load bytes       536130
    Duplicate failed modules:
                  module-name        How-many-times                    Reason
                    kvm_intel                     7                  Becoming
                          kvm                     7                  Becoming
                    irqbypass                     6           Becoming & Load
             crct10dif_pclmul                     7           Becoming & Load
          ghash_clmulni_intel                     7           Becoming & Load
                 sha512_ssse3                     6           Becoming & Load
               sha512_generic                     7           Becoming & Load
                  aesni_intel                     7                  Becoming
                  crypto_simd                     7           Becoming & Load
                       cryptd                     3           Becoming & Load
                        evdev                     1                  Becoming
                    serio_raw                     1                  Becoming
                         nvme                     3                  Becoming
                    nvme_core                     3                  Becoming
                       t10_pi                     3                  Becoming
                   virtio_pci                     3                  Becoming
                 crc32_pclmul                     6           Becoming & Load
               crc64_rocksoft                     3                  Becoming
                 crc32c_intel                     3                  Becoming
        virtio_pci_modern_dev                     2                  Becoming
        virtio_pci_legacy_dev                     1                  Becoming
                        crc64                     2                  Becoming
                       virtio                     2                  Becoming
                  virtio_ring                     2                  Becoming

    [0] https://github.com/ColinIanKing/stress-ng.git
    [1] echo 0 > /proc/sys/vm/oom_dump_tasks
        ./stress-ng --module 100 --module-name xfs

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 5d28ce51db module: extract patient module check into helper
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit f71afa6a420111da90657fe999a8e32c42d5c7d6
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Fri Mar 10 20:05:52 2023 -0800

    module: extract patient module check into helper

    The patient module check inside add_unformed_module() is large
    enough as we need it. It is a bit hard to read too, so just
    move it to a helper and do the inverse checks first to help
    shift the code and make it easier to read. The new helper then
    is module_patient_check_exists().

    To make this work we need to mvoe the finished_loading() up,
    we do that without making any functional changes to that routine.

    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile a1da4498e3 modules/kmod: replace implementation with a semaphore
JIRA: https://issues.redhat.com/browse/RHEL-28063

Conflicts: RHEL9 does not have upstrema 48380368dec148 which
           changed DEFINE_SEMAPHORE to take a number argument.
	   Implement the RHEL9 single arg version that open-codes
	   the initialization.

commit 25a1b5b518f4336bff934ac8348da6c57158363a
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Fri Mar 24 18:38:00 2023 -0700

    modules/kmod: replace implementation with a semaphore

    Simplify the concurrency delimiter we use for kmod with the semaphore.
    I had used the kmod strategy to try to implement a similar concurrency
    delimiter for the kernel_read*() calls from the finit_module() path
    so to reduce vmalloc() memory pressure. That effort didn't provide yet
    conclusive results, but one thing that became clear is we can use
    the suggested alternative solution with semaphores which Linus hinted
    at instead of using the atomic / wait strategy.

    I've stress tested this with kmod test 0008:

    time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008

    And I get only a *slight* delay. That delay however is small, a few
    seconds for a full test loop run that runs 150 times, for about ~30-40
    seconds. The small delay is worth the simplfication IMHO.

    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Reviewed-by: Miroslav Benes <mbenes@suse.cz>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 0e519b399a module: fix kmemleak annotations for non init ELF sections
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 430bb0d1c3376c988982f14bcbe71f917c89e1ab
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Tue Apr 4 18:52:47 2023 -0700

    module: fix kmemleak annotations for non init ELF sections

    Commit ac3b43283923 ("module: replace module_layout with module_memory")
    reworked the way to handle memory allocations to make it clearer. But it
    lost in translation how we handled kmemleak_ignore() or kmemleak_not_leak()
    for different ELF sections.

    Fix this and clarify the comments a bit more. Contrary to the old way
    of using kmemleak_ignore() for init.* ELF sections we stick now only to
    kmemleak_not_leak() as per suggestion by Catalin Marinas so to avoid
    any false positives and simplify the code.

    Fixes: ac3b43283923 ("module: replace module_layout with module_memory")
    Reported-by: Jim Cromie <jim.cromie@gmail.com>
    Acked-by: Song Liu <song@kernel.org>
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile f776cbeb9c module: Ignore L0 and rename is_arm_mapping_symbol()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 0a3bf86092c38f7b72c56c6901c78dd302411307
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date:   Fri Mar 31 17:15:53 2023 +0800

    module: Ignore L0 and rename is_arm_mapping_symbol()

    The L0 symbol is generated when build module on LoongArch, ignore it in
    modpost and when looking at module symbols, otherwise we can not see the
    expected call trace.

    Now is_arm_mapping_symbol() is not only for ARM, in order to reflect the
    reality, rename is_arm_mapping_symbol() to is_mapping_symbol().

    This is related with commit c17a2538704f ("mksysmap: Fix the mismatch of
    'L0' symbols in System.map").

    (1) Simple test case

      [loongson@linux hello]$ cat hello.c
      #include <linux/init.h>
      #include <linux/module.h>
      #include <linux/printk.h>

      static void test_func(void)
      {
              pr_info("This is a test\n");
              dump_stack();
      }

      static int __init hello_init(void)
      {
              pr_warn("Hello, world\n");
              test_func();

              return 0;
      }

      static void __exit hello_exit(void)
      {
              pr_warn("Goodbye\n");
      }

      module_init(hello_init);
      module_exit(hello_exit);
      MODULE_LICENSE("GPL");
      [loongson@linux hello]$ cat Makefile
      obj-m:=hello.o

      ccflags-y += -g -Og

      all:
              make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules
      clean:
              make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) clean

    (2) Test environment

    system: LoongArch CLFS 5.5
    https://github.com/sunhaiyong1978/CLFS-for-LoongArch/releases/tag/5.0
    It needs to update grub to avoid booting error "invalid magic number".

    kernel: 6.3-rc1 with loongson3_defconfig + CONFIG_DYNAMIC_FTRACE=y

    (3) Test result

    Without this patch:

      [root@linux hello]# insmod hello.ko
      [root@linux hello]# dmesg
      ...
      Hello, world
      This is a test
      ...
      Call Trace:
      [<9000000000223728>] show_stack+0x68/0x18c
      [<90000000013374cc>] dump_stack_lvl+0x60/0x88
      [<ffff800002050028>] L0\x01+0x20/0x2c [hello]
      [<ffff800002058028>] L0\x01+0x20/0x30 [hello]
      [<900000000022097c>] do_one_initcall+0x88/0x288
      [<90000000002df890>] do_init_module+0x54/0x200
      [<90000000002e1e18>] __do_sys_finit_module+0xc4/0x114
      [<90000000013382e8>] do_syscall+0x7c/0x94
      [<9000000000221e3c>] handle_syscall+0xbc/0x158

    With this patch:

      [root@linux hello]# insmod hello.ko
      [root@linux hello]# dmesg
      ...
      Hello, world
      This is a test
      ...
      Call Trace:
      [<9000000000223728>] show_stack+0x68/0x18c
      [<90000000013374cc>] dump_stack_lvl+0x60/0x88
      [<ffff800002050028>] test_func+0x28/0x34 [hello]
      [<ffff800002058028>] hello_init+0x28/0x38 [hello]
      [<900000000022097c>] do_one_initcall+0x88/0x288
      [<90000000002df890>] do_init_module+0x54/0x200
      [<90000000002e1e18>] __do_sys_finit_module+0xc4/0x114
      [<90000000013382e8>] do_syscall+0x7c/0x94
      [<9000000000221e3c>] handle_syscall+0xbc/0x158

    Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
    Tested-by: Youling Tang <tangyouling@loongson.cn> # for LoongArch
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 7dbb7e3cc8 module: Move is_arm_mapping_symbol() to module_symbol.h
JIRA: https://issues.redhat.com/browse/RHEL-28063

Conflict: Slight context diff but patch changes are the same.

commit 987d2e0aaa55de40938435be760aa96428470fd6
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date:   Fri Mar 31 17:15:52 2023 +0800

    module: Move is_arm_mapping_symbol() to module_symbol.h

    In order to avoid duplicated code, move is_arm_mapping_symbol() to
    include/linux/module_symbol.h, then remove is_arm_mapping_symbol()
    in the other places.

    Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile b8f8ff56f2 module: Sync code of is_arm_mapping_symbol()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 87e5b1e8f257023ac5c4d2b8f07716a7f3dcc8ea
Author: Tiezhu Yang <yangtiezhu@loongson.cn>
Date:   Fri Mar 31 17:15:51 2023 +0800

    module: Sync code of is_arm_mapping_symbol()

    After commit 2e3a10a155 ("ARM: avoid ARM binutils leaking ELF local
    symbols") and commit d6b732666a1b ("modpost: fix undefined behavior of
    is_arm_mapping_symbol()"), many differences of is_arm_mapping_symbol()
    exist in kernel/module/kallsyms.c and scripts/mod/modpost.c, just sync
    the code to keep consistent.

    Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 58c52f78c0 module: already_uses() - reduce pr_debug output volume
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 33c951f62920d144ca89daa0560180a49afb6f1e
Author: Jim Cromie <jim.cromie@gmail.com>
Date:   Tue Mar 21 19:36:23 2023 -0600

    module: already_uses() - reduce pr_debug output volume

    already_uses() is unnecessarily chatty.

    `modprobe i915` yields 491 messages like:

      [   64.108744] i915 uses drm!

    This is a normal situation, and isn't worth all the log entries.

    NOTE: I've preserved the "does not use %s" messages, which happens
    less often, but does happen.  Its not clear to me what it tells a
    reader, or what info might improve the pr_debug's utility.

    [ 6847.584999] main:already_uses:569: amdgpu does not use ttm!
    [ 6847.585001] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585014] main:already_uses:569: amdgpu does not use drm!
    [ 6847.585016] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585024] main:already_uses:569: amdgpu does not use drm_display_helper!
    [ 6847.585025] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585084] main:already_uses:569: amdgpu does not use drm_kms_helper!
    [ 6847.585086] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585175] main:already_uses:569: amdgpu does not use drm_buddy!
    [ 6847.585176] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585202] main:already_uses:569: amdgpu does not use i2c_algo_bit!
    [ 6847.585204] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585249] main:already_uses:569: amdgpu does not use gpu_sched!
    [ 6847.585250] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585314] main:already_uses:569: amdgpu does not use video!
    [ 6847.585315] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585409] main:already_uses:569: amdgpu does not use iommu_v2!
    [ 6847.585410] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6847.585816] main:already_uses:569: amdgpu does not use drm_ttm_helper!
    [ 6847.585818] main:add_module_usage:584: Allocating new usage for amdgpu.
    [ 6848.762268] dyndbg: add-module: amdgpu.2533 sites

    no functional changes.

    Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 5bcdd7cfaf module: add section-size to move_module pr_debug
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 66a2301edf313d630c2ece4f3721c5b3402653ee
Author: Jim Cromie <jim.cromie@gmail.com>
Date:   Tue Mar 21 19:36:22 2023 -0600

    module: add section-size to move_module pr_debug

    move_module() pr_debug's "Final section addresses for $modname".
    Add section addresses to the message, for anyone looking at these.

    no functional changes.

    Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile db74bac793 module: add symbol-name to pr_debug Absolute symbol
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit b10addf37bbcaee66672eb54c15532266c8daea6
Author: Jim Cromie <jim.cromie@gmail.com>
Date:   Tue Mar 21 19:36:21 2023 -0600

    module: add symbol-name to pr_debug Absolute symbol

    The pr_debug("Absolute symbol" ..) reports value, (which is usually
    0), but not the name, which is more informative.  So add it.

    no functional changes

    Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:25 -04:00
Donald Dutile 38e20bcc3c module: in layout_sections, move_module: add the modname
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 6ed81802d4d1b037ad2d1657511ff0c2e9aeda14
Author: Jim Cromie <jim.cromie@gmail.com>
Date:   Tue Mar 21 19:36:20 2023 -0600

    module: in layout_sections, move_module: add the modname

    layout_sections() and move_module() each issue ~50 messages for each
    module loaded.  Add mod-name into their 2 header lines, to help the
    reader find his module.

    no functional changes.

    Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile ccf1cf0a34 module: fold usermode helper kmod into modules directory
JIRA: https://issues.redhat.com/browse/RHEL-28063

Conflict: Dropped MAINTAINERS hunk because it wouldn't apply cleanly
          and not needed.

commit 25be451aa4c0e9a96c59a626ab0e93d5cb7f6f48
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:35:42 2023 -0700

    module: fold usermode helper kmod into modules directory

    The kernel/kmod.c is already only built if we enabled modules, so
    just stuff it under kernel/module/kmod.c and unify the MAINTAINERS
    file for it.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile d97fc0bedf module: merge remnants of setup_load_info() to elf validation
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 3d40bb903ed1f654707d34bdd61ee2c332000e4b
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:35:41 2023 -0700

    module: merge remnants of setup_load_info() to elf validation

    The setup_load_info() was actually had ELF validation checks of its
    own. To later cache useful variables as an secondary step just means
    looping again over the ELF sections we just validated. We can simply
    keep tabs of the key sections of interest as we validate the module
    ELF section in one swoop, so do that and merge the two routines
    together.

    Expand a bit on the documentation / intent / goals.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile 724139c3b1 module: move more elf validity checks to elf_validity_check()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 1bb49db9919a4d4186cba288930e7026d8f7ec96
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:35:40 2023 -0700

    module: move more elf validity checks to elf_validity_check()

    The symbol and strings section validation currently happen in
    setup_load_info() but since they are also doing validity checks
    move this to elf_validity_check().

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile 33c789403d module: add stop-grap sanity check on module memcpy()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit c7ee8aebf6c0588c0aab76538aff395c3abf811c
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:35:39 2023 -0700

    module: add stop-grap sanity check on module memcpy()

    The integrity of the struct module we load is important, and although
    our ELF validator already checks that the module section must match
    struct module, add a stop-gap check before we memcpy() the final minted
    module. This also makes those inspecting the code what the goal is.

    While at it, clarify the goal behind updating the sh_addr address.
    The current comment is pretty misleading.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile d6be2f8a79 module: add sanity check for ELF module section
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 46752820f9abc013b6bd8172562b642376723313
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:35:38 2023 -0700

    module: add sanity check for ELF module section

    The ELF ".gnu.linkonce.this_module" section is special, it is what we
    use to construct the struct module __this_module, which THIS_MODULE
    points to. When userspace loads a module we always deal first with a
    copy of the userspace buffer, and twiddle with the userspace copy's
    version of the struct module. Eventually we allocate memory to do a
    memcpy() of that struct module, under the assumption that the module
    size is right. But we have no validity checks against the size or
    the requirements for the section.

    Add some validity checks for the special module section early and while
    at it, cache the module section index early, so we don't have to do that
    later.

    While at it, just move over the assigment of the info->mod to make the
    code clearer. The validity checker also adds an explicit size check to
    ensure the module section size matches the kernel's run time size for
    sizeof(struct module). This should prevent sloppy loads of modules
    which are built today *without* actually increasing the size of
    the struct module. A developer today can for example expand the size
    of struct module, rebuild a directoroy 'make fs/xfs/' for example and
    then try to insmode the driver there. That module would in effect have
    an incorrect size. This new size check would put a stop gap against such
    mistakes.

    This also makes the entire goal of ".gnu.linkonce.this_module" pretty
    clear. Before this patch verification of the goal / intent required some
    Indian Jones whips, torches and cleaning up big old spider webs.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile 57edbf0df7 module: rename check_module_license_and_versions() to check_export_symbol_versions()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 419e1a20f7bdef5380fde5ed73f05c98c28a598b
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:27:46 2023 -0700

    module: rename check_module_license_and_versions() to check_export_symbol_versions()

    This makes the routine easier to understand what the check its checking for.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile 4f4151e7ba module: converge taint work together
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 72f08b3cc631f4ebcaa9f373d18fc0b877fb6458
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:27:45 2023 -0700

    module: converge taint work together

    Converge on a compromise: so long as we have a module hit our linked
    list of modules we taint. That is, the module was about to become live.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile 88c5188c62 module: move signature taint to module_augment_kernel_taints()
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit c3bbf62ebf8c9e87cea875cfa146f44f46af4145
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:27:44 2023 -0700

    module: move signature taint to module_augment_kernel_taints()

    Just move the signature taint into the helper:

      module_augment_kernel_taints()

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile d3fa8f2f9b module: move tainting until after a module hits our linked list
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit a12b94511cf36855cd731c16005bd535e2007552
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:27:43 2023 -0700

    module: move tainting until after a module hits our linked list

    It is silly to have taints spread out all over, we can just compromise
    and add them if the module ever hit our linked list. Our sanity checkers
    should just prevent crappy drivers / bogus ELF modules / etc and kconfig
    options should be enough to let you *not* load things you don't want.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00
Donald Dutile a24c049ebc module: split taint adding with info checking
JIRA: https://issues.redhat.com/browse/RHEL-28063

commit 437c1f9cc61fd37829eaf12d8ae2f7dcc5dddce0
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Sun Mar 19 14:27:42 2023 -0700

    module: split taint adding with info checking

    check_modinfo() actually does two things:

     a) sanity checks, some of which are fatal, and so we
        prevent the user from completing trying to load a module
     b) taints the kernel

    The taints are pretty heavy handed because we're tainting the kernel
    *before* we ever even get to load the module into the modules linked
    list. That is, it it can fail for other reasons later as we review the
    module's structure.

    But this commit makes no functional changes, it just makes the intent
    clearer and splits the code up where needed to make that happen.

    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-06-17 14:17:24 -04:00