Commit Graph

10 Commits

Author SHA1 Message Date
Artem Savkov 1e9cbbe0f6 bpf: Add __bpf_kfunc_{start,end}_defs macros
JIRA: https://issues.redhat.com/browse/RHEL-23643

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflicts: missing xdp commits, missing vma_task iterator

commit 391145ba2accc48b596f3d438af1a6255b62a555
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Oct 31 14:56:24 2023 -0700

    bpf: Add __bpf_kfunc_{start,end}_defs macros

    BPF kfuncs are meant to be called from BPF programs. Accordingly, most
    kfuncs are not called from anywhere in the kernel, which the
    -Wmissing-prototypes warning is unhappy about. We've peppered
    __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
    defined in the codebase to suppress this warning.

    This patch adds two macros meant to bound one or many kfunc definitions.
    All existing kfunc definitions which use these __diag calls to suppress
    -Wmissing-prototypes are migrated to use the newly-introduced macros.
    A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the
    __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier
    version of this patch [0] and another recent mailing list thread [1].

    In the future we might need to ignore different warnings or do other
    kfunc-specific things. This change will make it easier to make such
    modifications for all kfunc defs.

      [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/
      [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Cc: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 11:23:42 +01:00
Artem Savkov 534a34437e bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 0de4f50de25af79c2a46db55d70cdbd8f985c6d1
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Tue Nov 7 21:22:03 2023 +0800

    bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
    
    BTF_TYPE_SAFE_TRUSTED(struct bpf_iter__task) in verifier.c wanted to
    teach BPF verifier that bpf_iter__task -> task is a trusted ptr. But it
    doesn't work well.
    
    The reason is, bpf_iter__task -> task would go through btf_ctx_access()
    which enforces the reg_type of 'task' is ctx_arg_info->reg_type, and in
    task_iter.c, we actually explicitly declare that the
    ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL.
    
    Actually we have a previous case like this[1] where PTR_TRUSTED is added to
    the arg flag for map_iter.
    
    This patch sets ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL |
    PTR_TRUSTED in task_reg_info.
    
    Similarly, bpf_cgroup_reg_info -> cgroup is also PTR_TRUSTED since we are
    under the protection of cgroup_mutex and we would check cgroup_is_dead()
    in __cgroup_iter_seq_show().
    
    This patch is to improve the user experience of the newly introduced
    bpf_iter_css_task kfunc before hitting the mainline. The Fixes tag is
    pointing to the commit introduced the bpf_iter_css_task kfunc.
    
    Link[1]:https://lore.kernel.org/all/20230706133932.45883-3-aspsk@isovalent.com/
    
    Fixes: 9c66dc94b62a ("bpf: Introduce css_task open-coded iterator kfuncs")
    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20231107132204.912120-2-zhouchuyi@bytedance.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:58 +01:00
Artem Savkov 3fd10bb078 bpf: Introduce css open-coded iterator kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 7251d0905e7518bcb990c8e9a3615b1bb23c78f2
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Wed Oct 18 14:17:42 2023 +0800

    bpf: Introduce css open-coded iterator kfuncs
    
    This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
    creation and manipulation of struct bpf_iter_css in open-coded iterator
    style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
    css_iter can be used to:
    
    1) iterating a sepcific cgroup tree with pre/post/up order
    
    2) iterating cgroup_subsystem in BPF Prog, like
    for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
    
    The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
    parameters defining iteration order and starting css. Here we also reuse
    BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
    BPF_CGROUP_ITER_ANCESTORS_UP enums.
    
    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:54 +01:00
Artem Savkov f6cd44c258 cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: missing 0083d27b21dd2 "cgroup: Improve cftype
add/rm error handling"

commit 4cdb91b0dea7d7f59fa84a13c7753cd434fdedcf
Author: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Date:   Fri Mar 3 15:23:10 2023 +0530

    cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers

    Replace mutex_[un]lock() with cgroup_[un]lock() wrappers to stay
    consistent across cgroup core and other subsystem code, while
    operating on the cgroup_mutex.

    Signed-off-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Tejun Heo <tj@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:20 +02:00
Jerome Marchand a846961ea9 bpf: Make struct cgroup btf id global
Bugzilla: https://bugzilla.redhat.com/2177177

commit 5e67b8ef125bb6e83bf0f0442ad7ffc09e7956f9
Author: Yonghong Song <yhs@fb.com>
Date:   Tue Oct 25 21:28:40 2022 -0700

    bpf: Make struct cgroup btf id global

    Make struct cgroup btf id global so later patch can reuse
    the same btf id.

    Acked-by: David Vernet <void@manifault.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221026042840.672602-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:58 +02:00
Artem Savkov a038314072 bpf: Remove useless else if
Bugzilla: https://bugzilla.redhat.com/2166911

commit ccf365eac0c7705591dee0158ae5c198d9e8f858
Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Date:   Wed Aug 31 10:16:18 2022 +0800

    bpf: Remove useless else if
    
    The assignment of the else and else if branches is the same, so the else
    if here is redundant, so we remove it and add a comment to make the code
    here readable.
    
    ./kernel/bpf/cgroup_iter.c:81:6-8: WARNING: possible condition with no effect (if == else).
    
    Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2016
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
    Link: https://lore.kernel.org/r/20220831021618.86770-1-jiapeng.chong@linux.alibaba.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:26 +01:00
Artem Savkov 453fd2596d bpf: Add CGROUP prefix to cgroup_iter_order
Bugzilla: https://bugzilla.redhat.com/2166911

commit d4ffb6f39f1a1b260966b43a4ffdb64779c650dd
Author: Hao Luo <haoluo@google.com>
Date:   Thu Aug 25 15:39:36 2022 -0700

    bpf: Add CGROUP prefix to cgroup_iter_order
    
    bpf_cgroup_iter_order is globally visible but the entries do not have
    CGROUP prefix. As requested by Andrii, put a CGROUP in the names
    in bpf_cgroup_iter_order.
    
    This patch fixes two previous commits: one introduced the API and
    the other uses the API in bpf selftest (that is, the selftest
    cgroup_hierarchical_stats).
    
    I tested this patch via the following command:
    
      test_progs -t cgroup,iter,btf_dump
    
    Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
    Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection")
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Hao Luo <haoluo@google.com>
    Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com
    Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:25 +01:00
Artem Savkov a4b272755d bpf: Pin the start cgroup in cgroup_iter_seq_init()
Bugzilla: https://bugzilla.redhat.com/2166911

commit 1a5160d4d8fe63ba4964cfff4a85831b6af75f2d
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Nov 21 15:34:38 2022 +0800

    bpf: Pin the start cgroup in cgroup_iter_seq_init()
    
    bpf_iter_attach_cgroup() has already acquired an extra reference for the
    start cgroup, but the reference may be released if the iterator link fd
    is closed after the creation of iterator fd, and it may lead to
    user-after-free problem when reading the iterator fd.
    
    An alternative fix is pinning iterator link when opening iterator,
    but it will make iterator link being still visible after the close of
    iterator link fd and the behavior is different with other link types, so
    just fixing it by acquiring another reference for the start cgroup.
    
    Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221121073440.1828292-2-houtao@huaweicloud.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:24 +01:00
Artem Savkov 4f54b76fde bpf: cgroup_iter: support cgroup1 using cgroup fd
Bugzilla: https://bugzilla.redhat.com/2166911

commit 35256d673a9cf723d9e2edb5d51e1b1b6b197ba3
Author: Yosry Ahmed <yosryahmed@google.com>
Date:   Tue Oct 11 00:33:59 2022 +0000

    bpf: cgroup_iter: support cgroup1 using cgroup fd
    
    Use cgroup_v1v2_get_from_fd() in cgroup_iter to support attaching to
    both cgroup v1 and v2 using fds.
    
    Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Tejun Heo <tj@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:21 +01:00
Artem Savkov 08b66ec3e9 bpf: Introduce cgroup iter
Bugzilla: https://bugzilla.redhat.com/2166911

commit d4ccaf58a8472123ac97e6db03932c375b5c45ba
Author: Hao Luo <haoluo@google.com>
Date:   Wed Aug 24 16:31:13 2022 -0700

    bpf: Introduce cgroup iter
    
    Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
    
     - walking a cgroup's descendants in pre-order.
     - walking a cgroup's descendants in post-order.
     - walking a cgroup's ancestors.
     - process only the given cgroup.
    
    When attaching cgroup_iter, one can set a cgroup to the iter_link
    created from attaching. This cgroup is passed as a file descriptor
    or cgroup id and serves as the starting point of the walk. If no
    cgroup is specified, the starting point will be the root cgroup v2.
    
    For walking descendants, one can specify the order: either pre-order or
    post-order. For walking ancestors, the walk starts at the specified
    cgroup and ends at the root.
    
    One can also terminate the walk early by returning 1 from the iter
    program.
    
    Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
    program is called with cgroup_mutex held.
    
    Currently only one session is supported, which means, depending on the
    volume of data bpf program intends to send to user space, the number
    of cgroups that can be walked is limited. For example, given the current
    buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
    cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
    be walked is 512. This is a limitation of cgroup_iter. If the output
    data is larger than the kernel buffer size, after all data in the
    kernel buffer is consumed by user space, the subsequent read() syscall
    will signal EOPNOTSUPP. In order to work around, the user may have to
    update their program to reduce the volume of data sent to output. For
    example, skip some uninteresting cgroups. In future, we may extend
    bpf_iter flags to allow customizing buffer size.
    
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Hao Luo <haoluo@google.com>
    Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:03 +01:00