Commit Graph

61 Commits

Author SHA1 Message Date
Jerome Marchand 167e724542 bpf: Check unsupported ops from the bpf_struct_ops's cfi_stubs
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit e42ac14180554fa23a3312d4f921dc4ea7972fb7
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Mon Jul 22 11:30:45 2024 -0700

    bpf: Check unsupported ops from the bpf_struct_ops's cfi_stubs

    The bpf_tcp_ca struct_ops currently uses a "u32 unsupported_ops[]"
    array to track which ops is not supported.

    After cfi_stubs had been added, the function pointer in cfi_stubs is
    also NULL for the unsupported ops. Thus, the "u32 unsupported_ops[]"
    becomes redundant. This observation was originally brought up in the
    bpf/cfi discussion:
    https://lore.kernel.org/bpf/CAADnVQJoEkdjyCEJRPASjBw1QGsKYrF33QdMGc1RZa9b88bAEA@mail.gmail.com/

    The recent bpf qdisc patch (https://lore.kernel.org/bpf/20240714175130.4051012-6-amery.hung@bytedance.com/)
    also needs to specify quite many unsupported ops. It is a good time
    to clean it up.

    This patch removes the need of "u32 unsupported_ops[]" and tests for null-ness
    in the cfi_stubs instead.

    Testing the cfi_stubs is done in a new function bpf_struct_ops_supported().
    The verifier will call bpf_struct_ops_supported() when loading the
    struct_ops program. The ".check_member" is removed from the bpf_tcp_ca
    in this patch. ".check_member" could still be useful for other subsytems
    to enforce other restrictions (e.g. sched_ext checks for prog->sleepable).

    To keep the same error return, ENOTSUPP is used.

    Cc: Amery Hung <ameryhung@gmail.com>
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20240722183049.2254692-2-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-13 17:36:13 +01:00
Jerome Marchand cc024fa55c bpf: Replace 8 seq_puts() calls by seq_putc() calls
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit df862de41fcde6a0a4906647b0cacec2a8db5cf3
Author: Markus Elfring <elfring@users.sourceforge.net>
Date:   Sun Jul 14 16:15:34 2024 +0200

    bpf: Replace 8 seq_puts() calls by seq_putc() calls

    Single line breaks should occasionally be put into a sequence.
    Thus use the corresponding function “seq_putc”.

    This issue was transformed by using the Coccinelle software.

    Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/e26b7df9-cd63-491f-85e8-8cabe60a85e5@web.de

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-13 17:36:12 +01:00
Viktor Malik 52582b165d
bpf: Use precise image size for struct_ops trampoline
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit d1a426171d76b2cdf3dea5d52f6266090e4aa254
Author: Pu Lehui <pulehui@huawei.com>
Date:   Sat Jun 22 03:04:35 2024 +0000

    bpf: Use precise image size for struct_ops trampoline
    
    For trampoline using bpf_prog_pack, we need to generate a rw_image
    buffer with size of (image_end - image). For regular trampoline, we use
    the precise image size generated by arch_bpf_trampoline_size to allocate
    rw_image. But for struct_ops trampoline, we allocate rw_image directly
    using close to PAGE_SIZE size. We do not need to allocate for that much,
    as the patch size is usually much smaller than PAGE_SIZE. Let's use
    precise image size for it too.
    
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv
    Acked-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/bpf/20240622030437.3973492-2-pulehui@huaweicloud.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 15:55:13 +01:00
Viktor Malik bb34399157
bpf: support epoll from bpf struct_ops links.
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit 1adddc97aa44c8783f9f0276ea70854d56f9f6df
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed May 29 23:59:41 2024 -0700

    bpf: support epoll from bpf struct_ops links.
    
    Add epoll support to bpf struct_ops links to trigger EPOLLHUP event upon
    detachment.
    
    This patch implements the "poll" of the "struct file_operations" for BPF
    links and introduces a new "poll" operator in the "struct bpf_link_ops". By
    implementing "poll" of "struct bpf_link_ops" for the links of struct_ops,
    the file descriptor of a struct_ops link can be added to an epoll file
    descriptor to receive EPOLLHUP events.
    
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240530065946.979330-4-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:01 +01:00
Viktor Malik 78e242a9a9
bpf: enable detaching links of struct_ops objects.
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit 6fb2544ea1493f52e50b753604791c01bd2cf897
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed May 29 23:59:40 2024 -0700

    bpf: enable detaching links of struct_ops objects.
    
    Implement the detach callback in bpf_link_ops for struct_ops so that user
    programs can detach a struct_ops link. The subsystems that struct_ops
    objects are registered to can also use this callback to detach the links
    being passed to them.
    
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240530065946.979330-3-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:01 +01:00
Viktor Malik c0905ae221
bpf: pass bpf_struct_ops_link to callbacks in bpf_struct_ops.
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit 73287fe228721b05690e671adbcccc6cf5435be6
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed May 29 23:59:39 2024 -0700

    bpf: pass bpf_struct_ops_link to callbacks in bpf_struct_ops.
    
    Pass an additional pointer of bpf_struct_ops_link to callback function reg,
    unreg, and update provided by subsystems defined in bpf_struct_ops. A
    bpf_struct_ops_map can be registered for multiple links. Passing a pointer
    of bpf_struct_ops_link helps subsystems to distinguish them.
    
    This pointer will be used in the later patches to let the subsystem
    initiate a detachment on a link that was registered to it previously.
    
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240530065946.979330-2-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:01 +01:00
Viktor Malik 6985f7b84f
bpf: Check return from set_memory_rox()
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit c733239f8f530872a1f80d8c45dcafbaff368737
Author: Christophe Leroy <christophe.leroy@csgroup.eu>
Date:   Sat Mar 16 08:35:41 2024 +0100

    bpf: Check return from set_memory_rox()
    
    arch_protect_bpf_trampoline() and alloc_new_pack() call
    set_memory_rox() which can fail, leading to unprotected memory.
    
    Take into account return from set_memory_rox() function and add
    __must_check flag to arch_protect_bpf_trampoline().
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.1710574353.git.christophe.leroy@csgroup.eu
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:29 +01:00
Jerome Marchand 1d9503406b bpf: Remove unnecessary err < 0 check in bpf_struct_ops_map_update_elem
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 7f3edd0c72c3f7214f8f28495f2e6466348eb128
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Fri Mar 15 12:21:12 2024 -0700

    bpf: Remove unnecessary err < 0 check in bpf_struct_ops_map_update_elem

    There is a "if (err)" check earlier, so the "if (err < 0)"
    check that this patch removing is unnecessary. It was my overlook
    when making adjustments to the bpf_struct_ops_prepare_trampoline()
    such that the caller does not have to worry about the new page when
    the function returns error.

    Fixes: 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.")
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20240315192112.2825039-1-martin.lau@linux.dev

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:16 +02:00
Jerome Marchand ef2edd603f bpf: struct_ops supports more than one page for trampolines.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 187e2af05abe6bf80581490239c449456627d17a
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Sat Feb 24 14:34:17 2024 -0800

    bpf: struct_ops supports more than one page for trampolines.

    The BPF struct_ops previously only allowed one page of trampolines.
    Each function pointer of a struct_ops is implemented by a struct_ops
    bpf program. Each struct_ops bpf program requires a trampoline.
    The following selftest patch shows each page can hold a little more
    than 20 trampolines.

    While one page is more than enough for the tcp-cc usecase,
    the sched_ext use case shows that one page is not always enough and hits
    the one page limit. This patch overcomes the one page limit by allocating
    another page when needed and it is limited to a total of
    MAX_IMAGE_PAGES (8) pages which is more than enough for
    reasonable usages.

    The variable st_map->image has been changed to st_map->image_pages, and
    its type has been changed to an array of pointers to pages.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:12 +02:00
Jerome Marchand cc4a11f1de bpf, net: validate struct_ops when updating value.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 73e4f9e615d7b99f39663d4722dc73e8fa5db5f9
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Sat Feb 24 14:34:16 2024 -0800

    bpf, net: validate struct_ops when updating value.

    Perform all validations when updating values of struct_ops maps. Doing
    validation in st_ops->reg() and st_ops->update() is not necessary anymore.
    However, tcp_register_congestion_control() has been called in various
    places. It still needs to do validations.

    Cc: netdev@vger.kernel.org
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240224223418.526631-2-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:12 +02:00
Jerome Marchand 44a12c946e bpf: Check cfi_stubs before registering a struct_ops type.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 3e0008336ae3153fb89b1a15bb877ddd38680fe6
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed Feb 21 18:11:04 2024 -0800

    bpf: Check cfi_stubs before registering a struct_ops type.

    Recently, st_ops->cfi_stubs was introduced. However, the upcoming new
    struct_ops support (e.g. sched_ext) is not aware of this and does not
    provide its own cfi_stubs. The kernel ends up NULL dereferencing the
    st_ops->cfi_stubs.

    Considering struct_ops supports kernel module now, this NULL check
    is necessary. This patch is to reject struct_ops registration
    that does not provide a cfi_stubs.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240222021105.1180475-2-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:11 +02:00
Jerome Marchand e56e6923fd bpf: Create argument information for nullable arguments.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 1611603537a4b88cec7993f32b70c03113801a46
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Thu Feb 8 18:37:49 2024 -0800

    bpf: Create argument information for nullable arguments.

    Collect argument information from the type information of stub functions to
    mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are
    nullable.  A nullable argument is annotated by suffixing "__nullable" at
    the argument name of stub function.

    For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label
    their reg_type with PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This
    makes the verifier to check programs and ensure that they properly check
    the pointer. The programs should check if the pointer is null before
    accessing the pointed memory.

    The implementer of a struct_ops type should annotate the arguments that can
    be null. The implementer should define a stub function (empty) as a
    placeholder for each defined operator. The name of a stub function should
    be in the pattern "<st_op_type>__<operator name>". For example, for
    test_maybe_null of struct bpf_testmod_ops, it's stub function name should
    be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by
    suffixing the argument name with "__nullable" at the stub function.

    Since we already has stub functions for kCFI, we just reuse these stub
    functions with the naming convention mentioned earlier. These stub
    functions with the naming convention is only required if there are nullable
    arguments to annotate. For functions having not nullable arguments, stub
    functions are not necessary for the purpose of this patch.

    This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for
    each member field of a struct_ops type.  "arg_info" will be assigned to
    "prog->aux->ctx_arg_info" of BPF struct_ops programs in
    check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later
    to set reg_type properly for the verifier.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:09 +02:00
Jerome Marchand 89bb25af1f bpf: Remove an unnecessary check.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit df9705eaa0bad034dad0f73386ff82f5c4dd7e24
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Feb 2 21:51:19 2024 -0800

    bpf: Remove an unnecessary check.

    The "i" here is always equal to "btf_type_vlen(t)" since
    the "for_each_member()" loop never breaks.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:08 +02:00
Jerome Marchand a213f9e654 bpf: Fix error checks against bpf_get_btf_vmlinux().
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit e6be8cd5d3cf54ccd0ae66027d6f4697b15f4c3e
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Thu Jan 25 18:31:13 2024 -0800

    bpf: Fix error checks against bpf_get_btf_vmlinux().

    In bpf_struct_ops_map_alloc, it needs to check for NULL in the returned
    pointer of bpf_get_btf_vmlinux() when CONFIG_DEBUG_INFO_BTF is not set.
    ENOTSUPP is used to preserve the same behavior before the
    struct_ops kmod support.

    In the function check_struct_ops_btf_id(), instead of redoing the
    bpf_get_btf_vmlinux() that has already been done in syscall.c, the fix
    here is to check for prog->aux->attach_btf_id.
    BPF_PROG_TYPE_STRUCT_OPS must require attach_btf_id and syscall.c
    guarantees a valid attach_btf as long as attach_btf_id is set.
    When attach_btf_id is not set, this patch returns -ENOTSUPP
    because it is what the selftest in test_libbpf_probe_prog_types()
    and libbpf_probes.c are expecting for feature probing purpose.

    Changes from v1:

     - Remove an unnecessary NULL check in check_struct_ops_btf_id()

    Reported-by: syzbot+88f0aafe5f950d7489d7@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/00000000000040d68a060fc8db8c@google.com/
    Reported-by: syzbot+1336f3d4b10bcda75b89@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/00000000000026353b060fc21c07@google.com/
    Fixes: fcc2c1fb0651 ("bpf: pass attached BTF to the bpf_struct_ops subsystem")
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240126023113.1379504-1-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:05 +02:00
Jerome Marchand be602d71dd bpf, net: switch to dynamic registration
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit f6be98d19985411ca1f3d53413d94d5b7f41c200
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:50:02 2024 -0800

    bpf, net: switch to dynamic registration

    Replace the static list of struct_ops types with per-btf struct_ops_tab to
    enable dynamic registration.

    Both bpf_dummy_ops and bpf_tcp_ca now utilize the registration function
    instead of being listed in bpf_struct_ops_types.h.

    Cc: netdev@vger.kernel.org
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand b104df0abc bpf: validate value_type
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 612d087d4ba54cef47946e22e5dabad762dd7ed5
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:50:01 2024 -0800

    bpf: validate value_type

    A value_type should consist of three components: refcnt, state, and data.
    refcnt and state has been move to struct bpf_struct_ops_common_value to
    make it easier to check the value type.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-11-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand 5e10d4fff9 bpf: hold module refcnt in bpf_struct_ops map creation and prog verification.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit e3f87fdfed7b770dd7066b02262b12747881e76d
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:50:00 2024 -0800

    bpf: hold module refcnt in bpf_struct_ops map creation and prog verification.

    To ensure that a module remains accessible whenever a struct_ops object of
    a struct_ops type provided by the module is still in use.

    struct bpf_struct_ops_map doesn't hold a refcnt to btf anymore since a
    module will hold a refcnt to it's btf already. But, struct_ops programs are
    different. They hold their associated btf, not the module since they need
    only btf to assure their types (signatures).

    However, verifier holds the refcnt of the associated module of a struct_ops
    type temporarily when verify a struct_ops prog. Verifier needs the help
    from the verifier operators (struct bpf_verifier_ops) provided by the owner
    module to verify data access of a prog, provide information, and generate
    code.

    This patch also add a count of links (links_cnt) to bpf_struct_ops_map. It
    avoids bpf_struct_ops_map_put_progs() from accessing btf after calling
    module_put() in bpf_struct_ops_map_free().

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-10-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand c7472ae0b9 bpf: pass attached BTF to the bpf_struct_ops subsystem
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit fcc2c1fb0651477c8ed78a3a293c175ccd70697a
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:59 2024 -0800

    bpf: pass attached BTF to the bpf_struct_ops subsystem

    Pass the fd of a btf from the userspace to the bpf() syscall, and then
    convert the fd into a btf. The btf is generated from the module that
    defines the target BPF struct_ops type.

    In order to inform the kernel about the module that defines the target
    struct_ops type, the userspace program needs to provide a btf fd for the
    respective module's btf. This btf contains essential information on the
    types defined within the module, including the target struct_ops type.

    A btf fd must be provided to the kernel for struct_ops maps and for the bpf
    programs attached to those maps.

    In the case of the bpf programs, the attach_btf_obj_fd parameter is passed
    as part of the bpf_attr and is converted into a btf. This btf is then
    stored in the prog->aux->attach_btf field. Here, it just let the verifier
    access attach_btf directly.

    In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd
    of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a
    btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added
    for map_flags to indicate that the value of value_type_btf_obj_fd is set.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand 9817e1ed0c bpf: lookup struct_ops types from a given module BTF.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 689423db3bda2244c24db8a64de4cdb37be1de41
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:58 2024 -0800

    bpf: lookup struct_ops types from a given module BTF.

    This is a preparation for searching for struct_ops types from a specified
    module. BTF is always btf_vmlinux now. This patch passes a pointer of BTF
    to bpf_struct_ops_find_value() and bpf_struct_ops_find(). Once the new
    registration API of struct_ops types is used, other BTFs besides
    btf_vmlinux can also be passed to them.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-8-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand d160999ffe bpf: pass btf object id in bpf_map_info.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 1338b93346587a2a6ac79bbcf55ef5b357745573
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:57 2024 -0800

    bpf: pass btf object id in bpf_map_info.

    Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex:
    bpftools struct_ops dump) know the correct btf from the kernel to look up
    type information of struct_ops types.

    Since struct_ops types can be defined and registered in a module. The
    type information of a struct_ops type are defined in the btf of the
    module defining it.  The userspace tools need to know which btf is for
    the module defining a struct_ops type.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand fd202cd3e1 bpf: make struct_ops_map support btfs other than btf_vmlinux.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 47f4f657acd5d04c78c5c5ac7022cba9ce3b4a7d
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:56 2024 -0800

    bpf: make struct_ops_map support btfs other than btf_vmlinux.

    Once new struct_ops can be registered from modules, btf_vmlinux is no
    longer the only btf that struct_ops_map would face.  st_map should remember
    what btf it should use to get type information.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-6-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand 1df8e22024 bpf, net: introduce bpf_struct_ops_desc.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 4c5763ed996a61b51d721d0968d0df957826ea49
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:54 2024 -0800

    bpf, net: introduce bpf_struct_ops_desc.

    Move some of members of bpf_struct_ops to bpf_struct_ops_desc.  type_id is
    unavailabe in bpf_struct_ops anymore. Modules should get it from the btf
    received by kmod's init function.

    Cc: netdev@vger.kernel.org
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-4-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand 793ced50da bpf: get type information with BTF_ID_LIST
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 95678395386d45fa0a075d2e7a6866326a469d76
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:53 2024 -0800

    bpf: get type information with BTF_ID_LIST

    Get ready to remove bpf_struct_ops_init() in the future. By using
    BTF_ID_LIST, it is possible to gather type information while building
    instead of runtime.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-3-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand 538837b742 bpf: refactory struct_ops type initialization to a function.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 3b1f89e747cd4b24244f2798a35d28815b744303
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:52 2024 -0800

    bpf: refactory struct_ops type initialization to a function.

    Move the majority of the code to bpf_struct_ops_init_one(), which can then
    be utilized for the initialization of newly registered dynamically
    allocated struct_ops types in the following patches.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-2-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Viktor Malik dfc3259578
x86/cfi,bpf: Fix bpf_struct_ops CFI
JIRA: https://issues.redhat.com/browse/RHEL-23644

Omitted-fix: 1732ebc4a261 ("riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops")
             Unsupported arch.

commit 2cd3e3772e41377f32d6eea643e0590774e9187c
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Dec 15 10:12:20 2023 +0100

    x86/cfi,bpf: Fix bpf_struct_ops CFI

    BPF struct_ops uses __arch_prepare_bpf_trampoline() to write
    trampolines for indirect function calls. These tramplines much have
    matching CFI.

    In order to obtain the correct CFI hash for the various methods, add a
    matching structure that contains stub functions, the compiler will
    generate correct CFI which we can pilfer for the trampolines.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20231215092707.566977112@infradead.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:31 +02:00
Viktor Malik 38d0ff9a49
bpf: Use arch_bpf_trampoline_size
JIRA: https://issues.redhat.com/browse/RHEL-23644

Conflicts: omitting bits from arch/riscv/net/bpf_jit_comp64.c as RISC-V
           is unsupported

commit 26ef208c209a0e6eed8942a5d191b39dccfa6e38
Author: Song Liu <song@kernel.org>
Date:   Wed Dec 6 14:40:53 2023 -0800

    bpf: Use arch_bpf_trampoline_size

    Instead of blindly allocating PAGE_SIZE for each trampoline, check the size
    of the trampoline with arch_bpf_trampoline_size(). This size is saved in
    bpf_tramp_image->size, and used for modmem charge/uncharge. The fallback
    arch_alloc_bpf_trampoline() still allocates a whole page because we need to
    use set_memory_* to protect the memory.

    struct_ops trampoline still uses a whole page for multiple trampolines.

    With this size check at caller (regular trampoline and struct_ops
    trampoline), remove arch_bpf_trampoline_size() from
    arch_prepare_bpf_trampoline() in archs.

    Also, update bpf_image_ksym_add() to handle symbol of different sizes.

    Signed-off-by: Song Liu <song@kernel.org>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Björn Töpel <bjorn@rivosinc.com>
    Tested-by: Björn Töpel <bjorn@rivosinc.com> # on riscv
    Link: https://lore.kernel.org/r/20231206224054.492250-7-song@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:14 +02:00
Viktor Malik dd807f7b47
bpf: Add helpers for trampoline image management
JIRA: https://issues.redhat.com/browse/RHEL-23644

Conflicts: replaces code containing a conflict introduced by
           1206412454 ("bpf: Create links for BPF struct_ops maps.")
           due to then-missing upstream commit d48567c9a0d1 ("mm:
           Introduce set_memory_rox()").

commit 82583daa2efc2e336962b231a46bad03a280b3e0
Author: Song Liu <song@kernel.org>
Date:   Wed Dec 6 14:40:50 2023 -0800

    bpf: Add helpers for trampoline image management

    As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec()
    to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free for
    different archs during the transition. Add the following helpers for this
    transition:

    void *arch_alloc_bpf_trampoline(unsigned int size);
    void arch_free_bpf_trampoline(void *image, unsigned int size);
    void arch_protect_bpf_trampoline(void *image, unsigned int size);
    void arch_unprotect_bpf_trampoline(void *image, unsigned int size);

    The fallback version of these helpers require size <= PAGE_SIZE, but they
    are only called with size == PAGE_SIZE. They will be called with size <
    PAGE_SIZE when arch_bpf_trampoline_size() helper is introduced later.

    Signed-off-by: Song Liu <song@kernel.org>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20231206224054.492250-4-song@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:14 +02:00
Artem Savkov bee7cc96a3 bpf: Charge modmem for struct_ops trampoline
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 5c04433daf9ed8b28d4900112be1fd19e1786b25
Author: Song Liu <song@kernel.org>
Date:   Thu Sep 14 15:25:42 2023 -0700

    bpf: Charge modmem for struct_ops trampoline
    
    Current code charges modmem for regular trampoline, but not for struct_ops
    trampoline. Add bpf_jit_[charge|uncharge]_modmem() to struct_ops so the
    trampoline is charged in both cases.
    
    Signed-off-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20230914222542.2986059-1-song@kernel.org
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Prarit Bhargava b79789281d mm: Introduce set_memory_rox()
JIRA: https://issues.redhat.com/browse/RHEL-25415

Conflicts: Minor drift issues, and not worried about unsupported arches.
Changes to arch/arm/mach-omap[12] are made in arch/arm/plat-omap which
is unified in RHEL9.

commit d48567c9a0d1e605639f8a8705a61bbb55fb4e84
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Oct 26 12:13:03 2022 +0200

    mm: Introduce set_memory_rox()

    Because endlessly repeating:

            set_memory_ro()
            set_memory_x()

    is getting tedious.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:42:51 -04:00
Jerome Marchand 31d8021979 bpf: Support default .validate() and .update() behavior for struct_ops links
JIRA: https://issues.redhat.com/browse/RHEL-10691

Conflicts: Context change from missing commit d48567c9a0d1 ("mm:
Introduce set_memory_rox()")

commit 8ba651ed7fa1641f7c4941b79f2e3dd4ddb58aec
Author: David Vernet <void@manifault.com>
Date:   Mon Aug 14 13:59:07 2023 -0500

    bpf: Support default .validate() and .update() behavior for struct_ops links

    Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also
    define the .validate() and .update() callbacks in its corresponding
    struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful
    in its own right to ensure that the map is unloaded if an application
    crashes. For example, with sched_ext, we want to automatically unload
    the host-wide scheduler if the application crashes. We would likely
    never support updating elements of a sched_ext struct_ops map, so we'd
    have to implement these callbacks showing that they _can't_ support
    element updates just to benefit from the basic lifetime management of
    struct_ops links.

    Let's enable struct_ops maps to work with BPF_F_LINK even if they
    haven't defined these callbacks, by assuming that a struct_ops map
    element cannot be updated by default.

    Acked-by: Kui-Feng Lee <thinker.li@gmail.com>
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:57 +01:00
Jerome Marchand 671f285c4f bpf: bpf_struct_ops: Remove unnecessary initial values of variables
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 5964d1e4594eb1dbfc1e2a34ec89eb48f6b03e75
Author: Li kunyu <kunyu@nfschina.com>
Date:   Sat Aug 5 01:59:29 2023 +0800

    bpf: bpf_struct_ops: Remove unnecessary initial values of variables

    err and tlinks is assigned first, so it does not need to initialize the
    assignment.

    Signed-off-by: Li kunyu <kunyu@nfschina.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20230804175929.2867-1-kunyu@nfschina.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:55 +01:00
Viktor Malik daad46d419
bpf: Centralize permissions checks for all BPF map types
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 6c3eba1c5e283fd2bb1c076dbfcb47f569c3bfde
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jun 13 15:35:32 2023 -0700

    bpf: Centralize permissions checks for all BPF map types
    
    This allows to do more centralized decisions later on, and generally
    makes it very explicit which maps are privileged and which are not
    (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants,
    as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit
    and easy to verify).
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-26 17:06:20 +02:00
Artem Savkov 54175d4877 bpf: Check IS_ERR for the bpf_map_get() return value
Bugzilla: https://bugzilla.redhat.com/2221599

commit 55fbae05476df65e5eee8be54f61d0257af0240b
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Fri Mar 24 11:42:41 2023 -0700

    bpf: Check IS_ERR for the bpf_map_get() return value
    
    This patch fixes a mistake in checking NULL instead of
    checking IS_ERR for the bpf_map_get() return value.
    
    It also fixes the return value in link_update_map() from -EINVAL
    to PTR_ERR(*_map).
    
    Reported-by: syzbot+71ccc0fe37abb458406b@syzkaller.appspotmail.com
    Fixes: 68b04864ca42 ("bpf: Create links for BPF struct_ops maps.")
    Fixes: aef56f2e918b ("bpf: Update the struct_ops of a bpf_link.")
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Acked-by: Kui-Feng Lee <kuifeng@meta.com>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20230324184241.1387437-1-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 84a3b67362 bpf: Update the struct_ops of a bpf_link.
Bugzilla: https://bugzilla.redhat.com/2221599

commit aef56f2e918bf8fc8de25f0b36e8c2aba44116ec
Author: Kui-Feng Lee <kuifeng@meta.com>
Date:   Wed Mar 22 20:24:02 2023 -0700

    bpf: Update the struct_ops of a bpf_link.
    
    By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
    to conveniently switch between different struct_ops on a single
    bpf_link. This would enable smoother transitions from one struct_ops
    to another.
    
    The struct_ops maps passing along with BPF_LINK_UPDATE should have the
    BPF_F_LINK flag.
    
    Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 1206412454 bpf: Create links for BPF struct_ops maps.
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: missing d48567c9a0d1 mm: Introduce set_memory_rox()

commit 68b04864ca425d1894c96b8141d4fba1181f11cb
Author: Kui-Feng Lee <kuifeng@meta.com>
Date:   Wed Mar 22 20:24:00 2023 -0700

    bpf: Create links for BPF struct_ops maps.

    Make bpf_link support struct_ops.  Previously, struct_ops were always
    used alone without any associated links. Upon updating its value, a
    struct_ops would be activated automatically. Yet other BPF program
    types required to make a bpf_link with their instances before they
    could become active. Now, however, you can create an inactive
    struct_ops, and create a link to activate it later.

    With bpf_links, struct_ops has a behavior similar to other BPF program
    types. You can pin/unpin them from their links and the struct_ops will
    be deactivated when its link is removed while previously need someone
    to delete the value for it to be deactivated.

    bpf_links are responsible for registering their associated
    struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
    set to create a bpf_link, while a structs without this flag behaves in
    the same manner as before and is registered upon updating its value.

    The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
    used to craft the links for BPF struct_ops programs, but also to
    create links for BPF struct_ops them-self.  Since the links of BPF
    struct_ops programs are only used to create trampolines internally,
    they are never seen in other contexts. Thus, they can be reused for
    struct_ops themself.

    To maintain a reference to the map supporting this link, we add
    bpf_struct_ops_link as an additional type. The pointer of the map is
    RCU and won't be necessary until later in the patchset.

    Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
    Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 67465a1fdb bpf: Retire the struct_ops map kvalue->refcnt.
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: missing d48567c9a0d1 mm: Introduce set_memory_rox()

commit b671c2067a04c0668df174ff5dfdb573d1f9b074
Author: Kui-Feng Lee <kuifeng@meta.com>
Date:   Wed Mar 22 20:23:58 2023 -0700

    bpf: Retire the struct_ops map kvalue->refcnt.

    We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
    RCU grace period.

    Maintenance of kvalue->refcnt was a complicated task, as we had to
    simultaneously keep track of two reference counts: one for the
    reference count of bpf_map. When the kvalue->refcnt reaches zero, we
    also have to reduce the reference count on bpf_map - yet these steps
    are not performed in an atomic manner and require us to be vigilant
    when managing them. By eliminating kvalue->refcnt, we can make our
    maintenance more straightforward as the refcount of bpf_map is now
    solely managed!

    To prevent the trampoline image of a struct_ops from being released
    while it is still in use, we wait for an RCU grace period. The
    setsockopt(TCP_CONGESTION, "...") command allows you to change your
    socket's congestion control algorithm and can result in releasing the
    old struct_ops implementation. It is fine. However, this function is
    exposed through bpf_setsockopt(), it may be accessed by BPF programs
    as well. To ensure that the trampoline image belonging to struct_op
    can be safely called while its method is in use, the trampoline
    safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
    destruction of the associated images before returning from a
    trampoline and requires us to wait for an RCU grace period.

    Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
    Link: https://lore.kernel.org/r/20230323032405.3735486-2-kuifeng@meta.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 128dd7c7f8 bpf: return long from bpf_map_ops funcs
Bugzilla: https://bugzilla.redhat.com/2221599

commit d7ba4cc900bf1eea2d8c807c6b1fc6bd61f41237
Author: JP Kobryn <inwardvessel@gmail.com>
Date:   Wed Mar 22 12:47:54 2023 -0700

    bpf: return long from bpf_map_ops funcs
    
    This patch changes the return types of bpf_map_ops functions to long, where
    previously int was returned. Using long allows for bpf programs to maintain
    the sign bit in the absence of sign extension during situations where
    inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
    error is returned.
    
    The definitions of the helper funcs are generated from comments in the bpf
    uapi header at `include/uapi/linux/bpf.h`. The return type of these
    helpers was previously changed from int to long in commit bdb7b79b4c. For
    any case where one of the map helpers call the bpf_map_ops funcs that are
    still returning 32-bit int, a compiler might not include sign extension
    instructions to properly convert the 32-bit negative value a 64-bit
    negative value.
    
    For example:
    bpf assembly excerpt of an inlined helper calling a kernel function and
    checking for a specific error:
    
    ; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
      ...
      46:	call   0xffffffffe103291c	; htab_map_update_elem
    ; if (err && err != -EEXIST) {
      4b:	cmp    $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
    
    kernel function assembly excerpt of return value from
    `htab_map_update_elem` returning 32-bit int:
    
    movl $0xffffffef, %r9d
    ...
    movl %r9d, %eax
    
    ...results in the comparison:
    cmp $0xffffffffffffffef, $0x00000000ffffffef
    
    Fixes: bdb7b79b4c ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
    Tested-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
    Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov ab26182cb8 bpf: bpf_struct_ops memory usage
Bugzilla: https://bugzilla.redhat.com/2221599

commit f062226d8d59b521ddc946ad791048188a16722a
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Mar 5 12:46:09 2023 +0000

    bpf: bpf_struct_ops memory usage
    
    A new helper is introduced to calculate bpf_struct_ops memory usage.
    
    The result as follows,
    
    - before
    1: struct_ops  name count_map  flags 0x0
            key 4B  value 256B  max_entries 1  memlock 4096B
            btf_id 73
    
    - after
    1: struct_ops  name count_map  flags 0x0
            key 4B  value 256B  max_entries 1  memlock 5016B
            btf_id 73
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20230305124615.12358-13-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:13 +02:00
Felix Maurer 9910a031ea bpf: Require only one of cong_avoid() and cong_control() from a TCP CC
Bugzilla: https://bugzilla.redhat.com/2137876

commit 9f0265e921dee14096943ee11f793fa076aa7a72
Author: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Date:   Wed Jun 22 21:12:24 2022 +0200

    bpf: Require only one of cong_avoid() and cong_control() from a TCP CC
    
    Remove the check for required and optional functions in a struct
    tcp_congestion_ops from bpf_tcp_ca.c. Rely on
    tcp_register_congestion_control() to reject a BPF CC that does not
    implement all required functions, as it will do for a non-BPF CC.
    
    When a CC implements tcp_congestion_ops.cong_control(), the alternate
    cong_avoid() is not in use in the TCP stack. Previously, a BPF CC was
    still forced to implement cong_avoid() as a no-op since it was
    non-optional in bpf_tcp_ca.c.
    
    Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
    Reviewed-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/r/20220622191227.898118-3-jthinz@mailbox.tu-berlin.de
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-01-05 15:46:51 +01:00
Artem Savkov 31bc89bc24 bpf: Remove is_valid_bpf_tramp_flags()
Bugzilla: https://bugzilla.redhat.com/2137876

commit 535a57a7ffc04932ad83c1a5649b09ba6c93ce83
Author: Xu Kuohai <xukuohai@huawei.com>
Date:   Mon Jul 11 11:08:20 2022 -0400

    bpf: Remove is_valid_bpf_tramp_flags()
    
    Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags()
    to check the input flags. This check is architecture independent.
    So, to be consistent with x86, arm64 should also do this check
    before generating bpf trampoline.
    
    However, the BPF_TRAMP_F_XXX flags are not used by user code and the
    flags argument is almost constant at compile time, so this run time
    check is a bit redundant.
    
    Remove is_valid_bpf_tramp_flags() and add some comments to the usage of
    BPF_TRAMP_F_XXX flags, as suggested by Alexei.
    
    Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
    Acked-by: Song Liu <songliubraving@fb.com>
    Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:37 +01:00
Yauheni Kaliuta 503bec2387 bpf, x86: Generate trampolines from bpf_tramp_links
Bugzilla: https://bugzilla.redhat.com/2120968
Conflicts: already applied
  1d5f82d9dd47 ("bpf, x86: fix freeing of not-finalized bpf_prog_pack")

commit f7e0beaf39d3868dc700d4954b26cf8443c5d423
Author: Kui-Feng Lee <kuifeng@fb.com>
Date:   Tue May 10 13:59:19 2022 -0700

    bpf, x86: Generate trampolines from bpf_tramp_links

    Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect
    struct bpf_tramp_link(s) for a trampoline.  struct bpf_tramp_link
    extends bpf_link to act as a linked list node.

    arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to
    collects all bpf_tramp_link(s) that a trampoline should call.

    Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links
    instead of bpf_tramp_progs.

    Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:03 +02:00
Yauheni Kaliuta 11fec2f10e bpf: Compute map_btf_id during build time
Bugzilla: https://bugzilla.redhat.com/2120968

commit c317ab71facc2cd0a94145973318a4c914e11acc
Author: Menglong Dong <imagedong@tencent.com>
Date:   Mon Apr 25 21:32:47 2022 +0800

    bpf: Compute map_btf_id during build time
    
    For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
    types are computed during vmlinux-btf init:
    
      btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()
    
    It will lookup the btf_type according to the 'map_btf_name' field in
    'struct bpf_map_ops'. This process can be done during build time,
    thanks to Jiri's resolve_btfids.
    
    selftest of map_ptr has passed:
    
      $96 map_ptr:OK
      Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
    
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:00 +02:00
Yauheni Kaliuta 548a30dd10 bpf: Remove unnecessary type castings
Bugzilla: https://bugzilla.redhat.com/2120968

commit 241d50ec5d79b94694adf13853c1f55d0f0b85e6
Author: Yu Zhe <yuzhe@nfschina.com>
Date:   Tue Apr 12 18:50:48 2022 -0700

    bpf: Remove unnecessary type castings
    
    Remove/clean up unnecessary void * type castings.
    
    Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220413015048.12319-1-yuzhe@nfschina.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-28 16:52:09 +02:00
Artem Savkov 92b13fc051 bpf: Rename btf_member accessors.
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 8293eb995f349aed28006792cad4cb48091919dd
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Wed Dec 1 10:10:25 2021 -0800

    bpf: Rename btf_member accessors.

    Rename btf_member_bit_offset() and btf_member_bitfield_size() to
    avoid conflicts with similarly named helpers in libbpf's btf.h.
    Rename the kernel helpers, since libbpf helpers are part of uapi.

    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20211201181040.23337-3-alexei.starovoitov@gmail.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:42 +02:00
Yauheni Kaliuta 84f284d078 bpf: Add dummy BPF STRUCT_OPS for test purpose
Bugzilla: http://bugzilla.redhat.com/2069045

commit c196906d50e360d82ed9aa5596a9d0ce89b7ab78
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Oct 25 14:40:24 2021 +0800

    bpf: Add dummy BPF STRUCT_OPS for test purpose
    
    Currently the test of BPF STRUCT_OPS depends on the specific bpf
    implementation of tcp_congestion_ops, but it can not cover all
    basic functionalities (e.g, return value handling), so introduce
    a dummy BPF STRUCT_OPS for test purpose.
    
    Loading a bpf_dummy_ops implementation from userspace is prohibited,
    and its only purpose is to run BPF_PROG_TYPE_STRUCT_OPS program
    through bpf(BPF_PROG_TEST_RUN). Now programs for test_1() & test_2()
    are supported. The following three cases are exercised in
    bpf_dummy_struct_ops_test_run():
    
    (1) test and check the value returned from state arg in test_1(state)
    The content of state is copied from userspace pointer and copied back
    after calling test_1(state). The user pointer is saved in an u64 array
    and the array address is passed through ctx_in.
    
    (2) test and check the return value of test_1(NULL)
    Just simulate the case in which an invalid input argument is passed in.
    
    (3) test multiple arguments passing in test_2(state, ...)
    5 arguments are passed through ctx_in in form of u64 array. The first
    element of array is userspace pointer of state and others 4 arguments
    follow.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-4-houtao1@huawei.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:23:49 +03:00
Yauheni Kaliuta 7e33bc5a81 bpf: Factor out a helper to prepare trampoline for struct_ops prog
Bugzilla: http://bugzilla.redhat.com/2069045

commit 31a645aea4f8da5bb190ce322c6e5aacaef13855
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Oct 25 14:40:22 2021 +0800

    bpf: Factor out a helper to prepare trampoline for struct_ops prog
    
    Factor out a helper bpf_struct_ops_prepare_trampoline() to prepare
    trampoline for BPF_PROG_TYPE_STRUCT_OPS prog. It will be used by
    .test_run callback in following patch.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-2-houtao1@huawei.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:23:49 +03:00
Jiri Benc ffd8cd8977 bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618

Conflicts:
- [minor] context difference in bpf_tcp_ca_get_func_proto due to out of
  order backport of 5e0bc3082e2e "bpf: Forbid bpf_ktime_get_coarse_ns and
  bpf_timer_* in tracing progs"

commit eb18b49ea758ec052ac2a12c6bb204e1e877ec31
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Tue Aug 24 10:30:07 2021 -0700

    bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt

    This patch allows the bpf-tcp-cc to call bpf_setsockopt.  One use
    case is to allow a bpf-tcp-cc switching to another cc during init().
    For example, when the tcp flow is not ecn ready, the bpf_dctcp
    can switch to another cc by calling setsockopt(TCP_CONGESTION).

    During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
    called and this could cause a recursion but it is stopped by the
    current trampoline's logic (in the prog->active counter).

    While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
    the tcp stack calls bpf-tcp-cc's release().  To avoid the retiring
    bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
    available to the bpf-tcp-cc's release().  This will avoid release()
    making setsockopt() call that will potentially allocate new resources.

    Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
    from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
    bpf_setsockopt are available together.  Thus, bpf_getsockopt() is also
    added to all tcp_congestion_ops except release().

    When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
    to switch to a new cc, the old bpf-tcp-cc will be released by
    bpf_struct_ops_put().  Thus, this patch also puts the bpf_struct_ops_map
    after a rcu grace period because the trampoline's image cannot be freed
    while the old bpf-tcp-cc is still running.

    bpf-tcp-cc can only access icsk_ca_priv as SCALAR.  All kernel's
    tcp-cc is also accessing the icsk_ca_priv as SCALAR.   The size
    of icsk_ca_priv has already been raised a few times to avoid
    extra kmalloc and memory referencing.  The only exception is the
    kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
    To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
    value stored in icsk_ca_priv after switching and without over-complicating
    the bpf's verifier for this one exception in tcp_cdg, this patch does not
    allow switching to tcp_cdg.  If there is a need, bpf_tcp_cdg can be
    implemented and then use the bpf_sk_storage as the extended storage.

    bpf_sk_setsockopt proto has only been recently added and used
    in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
    same proto instead of adding a new proto specifically for bpf-tcp-cc.

    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20210824173007.3976921-1-kafai@fb.com

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-05-12 17:29:46 +02:00
Jerome Marchand 6bac83c4bc bpf: Handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
Bugzilla: http://bugzilla.redhat.com/2041365

commit 356ed64991c6847a0c4f2e8fa3b1133f7a14f1fc
Author: Hou Tao <houtao1@huawei.com>
Date:   Tue Sep 14 10:33:51 2021 +0800

    bpf: Handle return value of BPF_PROG_TYPE_STRUCT_OPS prog

    Currently if a function ptr in struct_ops has a return value, its
    caller will get a random return value from it, because the return
    value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped.

    So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline
    to save and return the return value of struct_ops prog if ret_size of
    the function ptr is greater than 0. Also restricting the flag to be
    used alone.

    Fixes: 85d33df357 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20210914023351.3664499-1-houtao1@huawei.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:17:12 +02:00
Alexei Starovoitov e21aa34178 bpf: Fix fexit trampoline.
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.

Fixes: fec56f5890 ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-18 00:22:51 +01:00
Roman Gushchin f043733f31 bpf: Eliminate rlimit-based memory accounting for bpf_struct_ops maps
Do not use rlimit-based memory accounting for bpf_struct_ops maps.
It has been replaced with the memcg-based memory accounting.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201201215900.3569844-20-guro@fb.com
2020-12-02 18:32:46 -08:00