Commit Graph

241 Commits

Author SHA1 Message Date
Viktor Malik 543e088c5e
bpf: Use bpf_mem_free_rcu when bpf_obj_dropping non-refcounted nodes
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 649924b76ab151a96bdd22a97a993fb0421f134c
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Nov 7 00:56:36 2023 -0800

    bpf: Use bpf_mem_free_rcu when bpf_obj_dropping non-refcounted nodes
    
    The use of bpf_mem_free_rcu to free refcounted local kptrs was added
    in commit 7e26cd12ad1c ("bpf: Use bpf_mem_free_rcu when
    bpf_obj_dropping refcounted nodes"). In the cover letter for the
    series containing that patch [0] I commented:
    
        Perhaps it makes sense to move to mem_free_rcu for _all_
        non-owning refs in the future, not just refcounted. This might
        allow custom non-owning ref lifetime + invalidation logic to be
        entirely subsumed by MEM_RCU handling. IMO this needs a bit more
        thought and should be tackled outside of a fix series, so it's not
        attempted here.
    
    It's time to start moving in the "non-owning refs have MEM_RCU
    lifetime" direction. As mentioned in that comment, using
    bpf_mem_free_rcu for all local kptrs - not just refcounted - is
    necessarily the first step towards that goal. This patch does so.
    
    After this patch the memory pointed to by all local kptrs will not be
    reused until RCU grace period elapses. The verifier's understanding of
    non-owning ref validity and the clobbering logic it uses to enforce
    that understanding are not changed here, that'll happen gradually in
    future work, including further patches in the series.
    
      [0]: https://lore.kernel.org/all/20230821193311.3290257-1-davemarchevsky@fb.com/
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20231107085639.3016113-4-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:51:42 +02:00
Viktor Malik 36e1058bd5
bpf: Add KF_RCU flag to bpf_refcount_acquire_impl
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 1500a5d9f49cb66906d3ea1c9158df25cc41dd40
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Nov 7 00:56:34 2023 -0800

    bpf: Add KF_RCU flag to bpf_refcount_acquire_impl
    
    Refcounted local kptrs are kptrs to user-defined types with a
    bpf_refcount field. Recent commits ([0], [1]) modified the lifetime of
    refcounted local kptrs such that the underlying memory is not reused
    until RCU grace period has elapsed.
    
    Separately, verification of bpf_refcount_acquire calls currently
    succeeds for MAYBE_NULL non-owning reference input, which is a problem
    as bpf_refcount_acquire_impl has no handling for this case.
    
    This patch takes advantage of aforementioned lifetime changes to tag
    bpf_refcount_acquire_impl kfunc KF_RCU, thereby preventing MAYBE_NULL
    input to the kfunc. The KF_RCU flag applies to all kfunc params; it's
    fine for it to apply to the void *meta__ign param as that's populated by
    the verifier and is tagged __ign regardless.
    
      [0]: commit 7e26cd12ad1c ("bpf: Use bpf_mem_free_rcu when
           bpf_obj_dropping refcounted nodes") is the actual change to
           allocation behaivor
      [1]: commit 0816b8c6bf7f ("bpf: Consider non-owning refs to refcounted
           nodes RCU protected") modified verifier understanding of
           refcounted local kptrs to match [0]'s changes
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Fixes: 7c50b1cb76ac ("bpf: Add bpf_refcount_acquire kfunc")
    Link: https://lore.kernel.org/r/20231107085639.3016113-2-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:51:41 +02:00
Viktor Malik 467d71bf83
bpf: Add __bpf_dynptr_data* for in kernel use
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 74523c06ae20b83c5508a98af62393ac34913362
Author: Song Liu <song@kernel.org>
Date:   Mon Nov 6 20:57:23 2023 -0800

    bpf: Add __bpf_dynptr_data* for in kernel use
    
    Different types of bpf dynptr have different internal data storage.
    Specifically, SKB and XDP type of dynptr may have non-continuous data.
    Therefore, it is not always safe to directly access dynptr->data.
    
    Add __bpf_dynptr_data and __bpf_dynptr_data_rw to replace direct access to
    dynptr->data.
    
    Update bpf_verify_pkcs7_signature to use __bpf_dynptr_data instead of
    dynptr->data.
    
    Signed-off-by: Song Liu <song@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
    Link: https://lore.kernel.org/bpf/20231107045725.2278852-2-song@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:51:40 +02:00
Viktor Malik 7fafe65db6
bpf: Introduce task_vma open-coded iterator kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-23644

Conflicts: Several commits were previously backported out of order:
           96a4110030fb ("bpf: Introduce css_task open-coded iterator kfuncs")
           e7c7c9dedb42 ("bpf: fix compilation error without CGROUPS")
           391145ba2acc ("bpf: Add __bpf_kfunc_{start,end}_defs macros").
           Updated the commit to match upstream code as much as possible.

commit 4ac4546821584736798aaa9e97da9f6eaf689ea3
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Fri Oct 13 13:44:24 2023 -0700

    bpf: Introduce task_vma open-coded iterator kfuncs

    This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
    creation and manipulation of struct bpf_iter_task_vma in open-coded
    iterator style. BPF programs can use these kfuncs directly or through
    bpf_for_each macro for natural-looking iteration of all task vmas.

    The implementation borrows heavily from bpf_find_vma helper's locking -
    differing only in that it holds the mmap_read lock for all iterations
    while the helper only executes its provided callback on a maximum of 1
    vma. Aside from locking, struct vma_iterator and vma_next do all the
    heavy lifting.

    A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the
    only field in struct bpf_iter_task_vma. This is because the inner data
    struct contains a struct vma_iterator (not ptr), whose size is likely to
    change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
    such a change would require change in opaque bpf_iter_task_vma struct's
    size. So better to allocate vma_iterator using BPF allocator, and since
    that alloc must already succeed, might as well allocate all iter fields,
    thereby freezing struct bpf_iter_task_vma size.

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:51:33 +02:00
Artem Savkov 1e9cbbe0f6 bpf: Add __bpf_kfunc_{start,end}_defs macros
JIRA: https://issues.redhat.com/browse/RHEL-23643

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflicts: missing xdp commits, missing vma_task iterator

commit 391145ba2accc48b596f3d438af1a6255b62a555
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Oct 31 14:56:24 2023 -0700

    bpf: Add __bpf_kfunc_{start,end}_defs macros

    BPF kfuncs are meant to be called from BPF programs. Accordingly, most
    kfuncs are not called from anywhere in the kernel, which the
    -Wmissing-prototypes warning is unhappy about. We've peppered
    __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
    defined in the codebase to suppress this warning.

    This patch adds two macros meant to bound one or many kfunc definitions.
    All existing kfunc definitions which use these __diag calls to suppress
    -Wmissing-prototypes are migrated to use the newly-introduced macros.
    A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the
    __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier
    version of this patch [0] and another recent mailing list thread [1].

    In the future we might need to ignore different warnings or do other
    kfunc-specific things. This change will make it easier to make such
    modifications for all kfunc defs.

      [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/
      [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Cc: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 11:23:42 +01:00
Artem Savkov 2114d1ada8 bpf: Check map->usercnt after timer->timer is assigned
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit fd381ce60a2d79cc967506208085336d3d268ae0
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Oct 30 14:36:16 2023 +0800

    bpf: Check map->usercnt after timer->timer is assigned
    
    When there are concurrent uref release and bpf timer init operations,
    the following sequence diagram is possible. It will break the guarantee
    provided by bpf_timer: bpf_timer will still be alive after userspace
    application releases or unpins the map. It also will lead to kmemleak
    for old kernel version which doesn't release bpf_timer when map is
    released.
    
    bpf program X:
    
    bpf_timer_init()
      lock timer->lock
        read timer->timer as NULL
        read map->usercnt != 0
    
                    process Y:
    
                    close(map_fd)
                      // put last uref
                      bpf_map_put_uref()
                        atomic_dec_and_test(map->usercnt)
                          array_map_free_timers()
                            bpf_timer_cancel_and_free()
                              // just return
                              read timer->timer is NULL
    
        t = bpf_map_kmalloc_node()
        timer->timer = t
      unlock timer->lock
    
    Fix the problem by checking map->usercnt after timer->timer is assigned,
    so when there are concurrent uref release and bpf timer init, either
    bpf_timer_cancel_and_free() from uref release reads a no-NULL timer
    or the newly-added atomic64_read() returns a zero usercnt.
    
    Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer)
    in bpf_timer_cancel_and_free() are not protected by a lock, so add
    a memory barrier to guarantee the order between map->usercnt and
    timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless
    read of timer->timer in bpf_timer_cancel_and_free().
    
    Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
    Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com
    Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231030063616.1653024-1-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:58 +01:00
Artem Savkov 114d586292 bpf: fix compilation error without CGROUPS
JIRA: https://issues.redhat.com/browse/RHEL-23643

Conflicts: missing vma_task iterator

commit 05670f81d1287c40ec861186e4c4e3401013e7fb
Author: Matthieu Baerts <matttbe@kernel.org>
Date:   Wed Nov 1 19:16:01 2023 +0100

    bpf: fix compilation error without CGROUPS

    Our MPTCP CI complained [1] -- and KBuild too -- that it was no longer
    possible to build the kernel without CONFIG_CGROUPS:

      kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_new':
      kernel/bpf/task_iter.c:919:14: error: 'CSS_TASK_ITER_PROCS' undeclared (first use in this function)
        919 |         case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
            |              ^~~~~~~~~~~~~~~~~~~
      kernel/bpf/task_iter.c:919:14: note: each undeclared identifier is reported only once for each function it appears in
      kernel/bpf/task_iter.c:919:36: error: 'CSS_TASK_ITER_THREADED' undeclared (first use in this function)
        919 |         case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
            |                                    ^~~~~~~~~~~~~~~~~~~~~~
      kernel/bpf/task_iter.c:927:60: error: invalid application of 'sizeof' to incomplete type 'struct css_task_iter'
        927 |         kit->css_it = bpf_mem_alloc(&bpf_global_ma, sizeof(struct css_task_iter));
            |                                                            ^~~~~~
      kernel/bpf/task_iter.c:930:9: error: implicit declaration of function 'css_task_iter_start'; did you mean 'task_seq_start'? [-Werror=implicit-function-declaration]
        930 |         css_task_iter_start(css, flags, kit->css_it);
            |         ^~~~~~~~~~~~~~~~~~~
            |         task_seq_start
      kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_next':
      kernel/bpf/task_iter.c:940:16: error: implicit declaration of function 'css_task_iter_next'; did you mean 'class_dev_iter_next'? [-Werror=implicit-function-declaration]
        940 |         return css_task_iter_next(kit->css_it);
            |                ^~~~~~~~~~~~~~~~~~
            |                class_dev_iter_next
      kernel/bpf/task_iter.c:940:16: error: returning 'int' from a function with return type 'struct task_struct *' makes pointer from integer without a cast [-Werror=int-conversion]
        940 |         return css_task_iter_next(kit->css_it);
            |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_destroy':
      kernel/bpf/task_iter.c:949:9: error: implicit declaration of function 'css_task_iter_end' [-Werror=implicit-function-declaration]
        949 |         css_task_iter_end(kit->css_it);
            |         ^~~~~~~~~~~~~~~~~

    This patch simply surrounds with a #ifdef the new code requiring CGroups
    support. It seems enough for the compiler and this is similar to
    bpf_iter_css_{new,next,destroy}() functions where no other #ifdef have
    been added in kernel/bpf/helpers.c and in the selftests.

    Fixes: 9c66dc94b62a ("bpf: Introduce css_task open-coded iterator kfuncs")
    Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/6665206927
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202310260528.aHWgVFqq-lkp@intel.com/
    Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
    [ added missing ifdefs for BTF_ID cgroup definitions ]
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20231101181601.1493271-1-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:57 +01:00
Artem Savkov b703028f69 bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl()
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit e383a45902337356d9ccad797094a27c6b2150f9
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Oct 20 21:32:01 2023 +0800

    bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl()
    
    The following warning was reported when running "./test_progs -t
    test_bpf_ma/percpu_free_through_map_free":
    
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 68 at kernel/bpf/memalloc.c:342
      CPU: 1 PID: 68 Comm: kworker/u16:2 Not tainted 6.6.0-rc2+ #222
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      Workqueue: events_unbound bpf_map_free_deferred
      RIP: 0010:bpf_mem_refill+0x21c/0x2a0
      ......
      Call Trace:
       <IRQ>
       ? bpf_mem_refill+0x21c/0x2a0
       irq_work_single+0x27/0x70
       irq_work_run_list+0x2a/0x40
       irq_work_run+0x18/0x40
       __sysvec_irq_work+0x1c/0xc0
       sysvec_irq_work+0x73/0x90
       </IRQ>
       <TASK>
       asm_sysvec_irq_work+0x1b/0x20
      RIP: 0010:unit_free+0x50/0x80
       ......
       bpf_mem_free+0x46/0x60
       __bpf_obj_drop_impl+0x40/0x90
       bpf_obj_free_fields+0x17d/0x1a0
       array_map_free+0x6b/0x170
       bpf_map_free_deferred+0x54/0xa0
       process_scheduled_works+0xba/0x370
       worker_thread+0x16d/0x2e0
       kthread+0x105/0x140
       ret_from_fork+0x39/0x60
       ret_from_fork_asm+0x1b/0x30
       </TASK>
      ---[ end trace 0000000000000000 ]---
    
    The reason is simple: __bpf_obj_drop_impl() does not know the freeing
    field is a per-cpu pointer and it uses bpf_global_ma to free the
    pointer. Because bpf_global_ma is not a per-cpu allocator, so ksize() is
    used to select the corresponding cache. The bpf_mem_cache with 16-bytes
    unit_size will always be selected to do the unmatched free and it will
    trigger the warning in free_bulk() eventually.
    
    Because per-cpu kptr doesn't support list or rb-tree now, so fix the
    problem by only checking whether or not the type of kptr is per-cpu in
    bpf_obj_free_fields(), and using bpf_global_percpu_ma to these kptrs.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231020133202.4043247-7-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:55 +01:00
Artem Savkov 51fc10f0a1 bpf: Move the declaration of __bpf_obj_drop_impl() to bpf.h
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit e581a3461de3f129cfe888a67d9f31086328271f
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Oct 20 21:32:00 2023 +0800

    bpf: Move the declaration of __bpf_obj_drop_impl() to bpf.h
    
    both syscall.c and helpers.c have the declaration of
    __bpf_obj_drop_impl(), so just move it to a common header file.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231020133202.4043247-6-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:55 +01:00
Artem Savkov b1b6578b4f bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit dfab99df147b0d364f0c199f832ff2aedfb2265a
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Wed Oct 18 14:17:43 2023 +0800

    bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
    
    css_iter and task_iter should be used in rcu section. Specifically, in
    sleepable progs explicit bpf_rcu_read_lock() is needed before use these
    iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
    use them directly.
    
    This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
    bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
    whether we are in rcu cs before we want to invoke this kfunc. If the rcu
    protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
    Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
    be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
    
    It is worth noting that currently, bpf_rcu_read_unlock does not
    clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
    only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
    search STACK_ITER.
    
    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231018061746.111364-6-zhouchuyi@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:54 +01:00
Artem Savkov 3fd10bb078 bpf: Introduce css open-coded iterator kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 7251d0905e7518bcb990c8e9a3615b1bb23c78f2
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Wed Oct 18 14:17:42 2023 +0800

    bpf: Introduce css open-coded iterator kfuncs
    
    This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
    creation and manipulation of struct bpf_iter_css in open-coded iterator
    style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
    css_iter can be used to:
    
    1) iterating a sepcific cgroup tree with pre/post/up order
    
    2) iterating cgroup_subsystem in BPF Prog, like
    for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
    
    The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
    parameters defining iteration order and starting css. Here we also reuse
    BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
    BPF_CGROUP_ITER_ANCESTORS_UP enums.
    
    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:54 +01:00
Artem Savkov d68171a511 bpf: Introduce task open coded iterator kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit c68a78ffe2cb4207f64fd0f4262818c728c67be0
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Wed Oct 18 14:17:41 2023 +0800

    bpf: Introduce task open coded iterator kfuncs
    
    This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
    creation and manipulation of struct bpf_iter_task in open-coded iterator
    style. BPF programs can use these kfuncs or through bpf_for_each macro to
    iterate all processes in the system.
    
    The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
    accepts a specific task and iterating type which allows:
    
    1. iterating all process in the system (BPF_TASK_ITER_ALL_PROCS)
    
    2. iterating all threads in the system (BPF_TASK_ITER_ALL_THREADS)
    
    3. iterating all threads of a specific task (BPF_TASK_ITER_PROC_THREADS)
    
    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Link: https://lore.kernel.org/r/20231018061746.111364-4-zhouchuyi@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:54 +01:00
Artem Savkov 5d0136ef81 bpf: Introduce css_task open-coded iterator kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-23643

Conflicts: missing task_vma iterator

commit 9c66dc94b62aef23300f05f63404afb8990920b4
Author: Chuyi Zhou <zhouchuyi@bytedance.com>
Date:   Wed Oct 18 14:17:40 2023 +0800

    bpf: Introduce css_task open-coded iterator kfuncs

    This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
    creation and manipulation of struct bpf_iter_css_task in open-coded
    iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
    end}. BPF programs can use these kfuncs through bpf_for_each macro for
    iteration of all tasks under a css.

    css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
    the bpf side has to be careful in where it allows to use this iter.
    Currently we only allow it in bpf_lsm and bpf iter-s.

    Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20231018061746.111364-3-zhouchuyi@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:54 +01:00
Artem Savkov 256ea91747 bpf: Add ability to pin bpf timer to calling CPU
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit d6247ecb6c1e17d7a33317090627f5bfe563cbb2
Author: David Vernet <void@manifault.com>
Date:   Wed Oct 4 11:23:38 2023 -0500

    bpf: Add ability to pin bpf timer to calling CPU
    
    BPF supports creating high resolution timers using bpf_timer_* helper
    functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which
    specifies that the timeout should be interpreted as absolute time. It
    would also be useful to be able to pin that timer to a core. For
    example, if you wanted to make a subset of cores run without timer
    interrupts, and only have the timer be invoked on a single core.
    
    This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag.
    When specified, the HRTIMER_MODE_PINNED flag is passed to
    hrtimer_start(). A subsequent patch will update selftests to validate.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Song Liu <song@kernel.org>
    Acked-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:52 +01:00
Artem Savkov 2472b5c30c bpf: Fix bpf_throw warning on 32-bit arch
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 7d3460632da2c2ad5c5708db82a0b72e2b66396c
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Mon Sep 18 17:52:32 2023 +0200

    bpf: Fix bpf_throw warning on 32-bit arch
    
    On 32-bit architectures, the pointer width is 32-bit, while we try to
    cast from a u64 down to it, the compiler complains on mismatch in
    integer size. Fix this by first casting to long which should match
    the pointer width on targets supported by Linux.
    
    Fixes: ec5290a178b7 ("bpf: Prevent KASAN false positive with bpf_throw")
    Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Link: https://lore.kernel.org/r/20230918155233.297024-3-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:48 +01:00
Artem Savkov 6fbd8ebd74 bpf: Disallow fentry/fexit/freplace for exception callbacks
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit fd548e1a46185000191a89cae4be560e076ed6c7
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:09 2023 +0200

    bpf: Disallow fentry/fexit/freplace for exception callbacks
    
    During testing, it was discovered that extensions to exception callbacks
    had no checks, upon running a testcase, the kernel ended up running off
    the end of a program having final call as bpf_throw, and hitting int3
    instructions.
    
    The reason is that while the default exception callback would have reset
    the stack frame to return back to the main program's caller, the
    replacing extension program will simply return back to bpf_throw, which
    will instead return back to the program and the program will continue
    execution, now in an undefined state where anything could happen.
    
    The way to support extensions to an exception callback would be to mark
    the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it
    from calling bpf_throw. This would make the JIT produce a prologue that
    restores saved registers and reset the stack frame. But let's not do
    that until there is a concrete use case for this, and simply disallow
    this for now.
    
    Similar issues will exist for fentry and fexit cases, where trampoline
    saves data on the stack when invoking exception callback, which however
    will then end up resetting the stack frame, and on return, the fexit
    program will never will invoked as the return address points to the main
    program's caller in the kernel. Instead of additional complexity and
    back and forth between the two stacks to enable such a use case, simply
    forbid it.
    
    One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't
    require any modifications, even though we emit instructions before the
    corresponding endbr64 instruction. This is because we ensure that a main
    subprog never serves as an exception callback, and therefore the
    exception callback (which will be a global subprog) can never serve as
    the tail call target, eliminating any discrepancies. However, once we
    support a BPF_PROG_TYPE_EXT to also act as an exception callback, it
    will end up requiring change to the tail call offset to account for the
    extra instructions. For simplicitly, tail calls could be disabled for
    such targets.
    
    Noting the above, it appears better to wait for a concrete use case
    before choosing to permit extension programs to replace exception
    callbacks.
    
    As a precaution, we disable fentry and fexit for exception callbacks as
    well.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-13-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov b814451f16 bpf: Prevent KASAN false positive with bpf_throw
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit ec5290a178b787b2f8b21581fdadc919bd004e12
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:07 2023 +0200

    bpf: Prevent KASAN false positive with bpf_throw
    
    The KASAN stack instrumentation when CONFIG_KASAN_STACK is true poisons
    the stack of a function when it is entered and unpoisons it when
    leaving. However, in the case of bpf_throw, we will never return as we
    switch our stack frame to the BPF exception callback. Later, this
    discrepancy will lead to confusing KASAN splats when kernel resumes
    execution on return from the BPF program.
    
    Fix this by unpoisoning everything below the stack pointer of the BPF
    program, which should cover the range that would not be unpoisoned. An
    example splat is below:
    
    BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x14e/0x170
    Write of size 8 at addr ffffc900013af958 by task test_progs/227
    
    CPU: 0 PID: 227 Comm: test_progs Not tainted 6.5.0-rc2-g43f1c6c9052a-dirty #26
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-2.fc39 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x4a/0x80
     print_report+0xcf/0x670
     ? arch_stack_walk+0x79/0x100
     kasan_report+0xda/0x110
     ? stack_trace_consume_entry+0x14e/0x170
     ? stack_trace_consume_entry+0x14e/0x170
     ? __pfx_stack_trace_consume_entry+0x10/0x10
     stack_trace_consume_entry+0x14e/0x170
     ? __sys_bpf+0xf2e/0x41b0
     arch_stack_walk+0x8b/0x100
     ? __sys_bpf+0xf2e/0x41b0
     ? bpf_prog_test_run_skb+0x341/0x1c70
     ? bpf_prog_test_run_skb+0x341/0x1c70
     stack_trace_save+0x9b/0xd0
     ? __pfx_stack_trace_save+0x10/0x10
     ? __kasan_slab_free+0x109/0x180
     ? bpf_prog_test_run_skb+0x341/0x1c70
     ? __sys_bpf+0xf2e/0x41b0
     ? __x64_sys_bpf+0x78/0xc0
     ? do_syscall_64+0x3c/0x90
     ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
     kasan_save_stack+0x33/0x60
     ? kasan_save_stack+0x33/0x60
     ? kasan_set_track+0x25/0x30
     ? kasan_save_free_info+0x2b/0x50
     ? __kasan_slab_free+0x109/0x180
     ? kmem_cache_free+0x191/0x460
     ? bpf_prog_test_run_skb+0x341/0x1c70
     kasan_set_track+0x25/0x30
     kasan_save_free_info+0x2b/0x50
     __kasan_slab_free+0x109/0x180
     kmem_cache_free+0x191/0x460
     bpf_prog_test_run_skb+0x341/0x1c70
     ? __pfx_bpf_prog_test_run_skb+0x10/0x10
     ? __fget_light+0x51/0x220
     __sys_bpf+0xf2e/0x41b0
     ? __might_fault+0xa2/0x170
     ? __pfx___sys_bpf+0x10/0x10
     ? lock_release+0x1de/0x620
     ? __might_fault+0xcd/0x170
     ? __pfx_lock_release+0x10/0x10
     ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
     __x64_sys_bpf+0x78/0xc0
     ? syscall_enter_from_user_mode+0x20/0x50
     do_syscall_64+0x3c/0x90
     entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    RIP: 0033:0x7f0fbb38880d
    Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d
    89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 45 12 00 f7 d8 64
    89 01 48
    RSP: 002b:00007ffe13907de8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
    RAX: ffffffffffffffda RBX: 00007ffe13908708 RCX: 00007f0fbb38880d
    RDX: 0000000000000050 RSI: 00007ffe13907e20 RDI: 000000000000000a
    RBP: 00007ffe13907e00 R08: 0000000000000000 R09: 00007ffe13907e20
    R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000003
    R13: 0000000000000000 R14: 00007f0fbb532000 R15: 0000000000cfbd90
     </TASK>
    
    The buggy address belongs to stack of task test_progs/227
    KASAN internal error: frame info validation failed; invalid marker: 0
    
    The buggy address belongs to the virtual mapping at
     [ffffc900013a8000, ffffc900013b1000) created by:
     kernel_clone+0xcd/0x600
    
    The buggy address belongs to the physical page:
    page:00000000b70f4332 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11418f
    flags: 0x2fffe0000000000(node=0|zone=2|lastcpupid=0x7fff)
    page_type: 0xffffffff()
    raw: 02fffe0000000000 0000000000000000 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffffc900013af800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffffc900013af880: 00 00 00 f1 f1 f1 f1 00 00 00 f3 f3 f3 f3 f3 00
    >ffffc900013af900: 00 00 00 00 00 00 00 00 00 00 00 f1 00 00 00 00
                                                        ^
     ffffc900013af980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffffc900013afa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ==================================================================
    Disabling lock debugging due to kernel taint
    
    Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-11-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov bd01d7114b bpf: Implement BPF exceptions
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit f18b03fabaa9b7c80e80b72a621f481f0d706ae0
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:01 2023 +0200

    bpf: Implement BPF exceptions
    
    This patch implements BPF exceptions, and introduces a bpf_throw kfunc
    to allow programs to throw exceptions during their execution at runtime.
    A bpf_throw invocation is treated as an immediate termination of the
    program, returning back to its caller within the kernel, unwinding all
    stack frames.
    
    This allows the program to simplify its implementation, by testing for
    runtime conditions which the verifier has no visibility into, and assert
    that they are true. In case they are not, the program can simply throw
    an exception from the other branch.
    
    BPF exceptions are explicitly *NOT* an unlikely slowpath error handling
    primitive, and this objective has guided design choices of the
    implementation of the them within the kernel (with the bulk of the cost
    for unwinding the stack offloaded to the bpf_throw kfunc).
    
    The implementation of this mechanism requires use of add_hidden_subprog
    mechanism introduced in the previous patch, which generates a couple of
    instructions to move R1 to R0 and exit. The JIT then rewrites the
    prologue of this subprog to take the stack pointer and frame pointer as
    inputs and reset the stack frame, popping all callee-saved registers
    saved by the main subprog. The bpf_throw function then walks the stack
    at runtime, and invokes this exception subprog with the stack and frame
    pointers as parameters.
    
    Reviewers must take note that currently the main program is made to save
    all callee-saved registers on x86_64 during entry into the program. This
    is because we must do an equivalent of a lightweight context switch when
    unwinding the stack, therefore we need the callee-saved registers of the
    caller of the BPF program to be able to return with a sane state.
    
    Note that we have to additionally handle r12, even though it is not used
    by the program, because when throwing the exception the program makes an
    entry into the kernel which could clobber r12 after saving it on the
    stack. To be able to preserve the value we received on program entry, we
    push r12 and restore it from the generated subprogram when unwinding the
    stack.
    
    For now, bpf_throw invocation fails when lingering resources or locks
    exist in that path of the program. In a future followup, bpf_throw will
    be extended to perform frame-by-frame unwinding to release lingering
    resources for each stack frame, removing this limitation.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-5-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov 2a8861d267 bpf: Add alloc/xchg/direct_access support for local percpu kptr
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 36d8bdf75a93190e5669b9d1d95994e13e15ba1d
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Sun Aug 27 08:27:44 2023 -0700

    bpf: Add alloc/xchg/direct_access support for local percpu kptr
    
    Add two new kfunc's, bpf_percpu_obj_new_impl() and
    bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
    Two functions are very similar to bpf_obj_new_impl()
    and bpf_obj_drop_impl(). The major difference is related
    to percpu handling.
    
        bpf_rcu_read_lock()
        struct val_t __percpu_kptr *v = map_val->percpu_data;
        ...
        bpf_rcu_read_unlock()
    
    For a percpu data map_val like above 'v', the reg->type
    is set as
    	PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
    if inside rcu critical section.
    
    MEM_RCU marking here is similar to NON_OWN_REF as 'v'
    is not a owning reference. But NON_OWN_REF is
    trusted and typically inside the spinlock while
    MEM_RCU is under rcu read lock. RCU is preferred here
    since percpu data structures mean potential concurrent
    access into its contents.
    
    Also, bpf_percpu_obj_new_impl() is restricted such that
    no pointers or special fields are allowed. Therefore,
    the bpf_list_head and bpf_rb_root will not be supported
    in this patch set to avoid potential memory leak issue
    due to racing between bpf_obj_free_fields() and another
    bpf_kptr_xchg() moving an allocated object to
    bpf_list_head and bpf_rb_root.
    
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230827152744.1996739-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:45 +01:00
Jerome Marchand bb58c3e368 bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 5861d1e8dbc4e1a03ebffb96ac041026cdd34c07
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Mon Aug 21 12:33:10 2023 -0700

    bpf: Allow bpf_spin_{lock,unlock} in sleepable progs

    Commit 9e7a4d9831 ("bpf: Allow LSM programs to use bpf spin locks")
    disabled bpf_spin_lock usage in sleepable progs, stating:

     Sleepable LSM programs can be preempted which means that allowng spin
     locks will need more work (disabling preemption and the verifier
     ensuring that no sleepable helpers are called when a spin lock is
     held).

    This patch disables preemption before grabbing bpf_spin_lock. The second
    requirement above "no sleepable helpers are called when a spin lock is
    held" is implicitly enforced by current verifier logic due to helper
    calls in spin_lock CS being disabled except for a few exceptions, none
    of which sleep.

    Due to above preemption changes, bpf_spin_lock CS can also be considered
    a RCU CS, so verifier's in_rcu_cs check is modified to account for this.

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230821193311.3290257-7-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:29:02 +01:00
Jerome Marchand d5d05aa5fc bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 7e26cd12ad1c8f3e55d32542c7e4708a9e6a3c02
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Mon Aug 21 12:33:07 2023 -0700

    bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes

    This is the final fix for the use-after-free scenario described in
    commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
    non-owning refs"). That commit, by virtue of changing
    bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
    the "refcount incr on 0" splat. The not_zero check in
    refcount_inc_not_zero, though, still occurs on memory that could have
    been free'd and reused, so the commit didn't properly fix the root
    cause.

    This patch actually fixes the issue by free'ing using the recently-added
    bpf_mem_free_rcu, which ensures that the memory is not reused until
    RCU grace period has elapsed. If that has happened then
    there are no non-owning references alive that point to the
    recently-free'd memory, so it can be safely reused.

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230821193311.3290257-4-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:29:01 +01:00
Jerome Marchand d2e76f4553 bpf: Add 'owner' field to bpf_{list,rb}_node
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit c3c510ce431cd99fa10dcd50d995c8e89330ee5b
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Jul 18 01:38:10 2023 -0700

    bpf: Add 'owner' field to bpf_{list,rb}_node

    As described by Kumar in [0], in shared ownership scenarios it is
    necessary to do runtime tracking of {rb,list} node ownership - and
    synchronize updates using this ownership information - in order to
    prevent races. This patch adds an 'owner' field to struct bpf_list_node
    and bpf_rb_node to implement such runtime tracking.

    The owner field is a void * that describes the ownership state of a
    node. It can have the following values:

      NULL           - the node is not owned by any data structure
      BPF_PTR_POISON - the node is in the process of being added to a data
                       structure
      ptr_to_root    - the pointee is a data structure 'root'
                       (bpf_rb_root / bpf_list_head) which owns this node

    The field is initially NULL (set by bpf_obj_init_field default behavior)
    and transitions states in the following sequence:

      Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root
      Removal:   ptr_to_root -> NULL

    Before a node has been successfully inserted, it is not protected by any
    root's lock, and therefore two programs can attempt to add the same node
    to different roots simultaneously. For this reason the intermediate
    BPF_PTR_POISON state is necessary. For removal, the node is protected
    by some root's lock so this intermediate hop isn't necessary.

    Note that bpf_list_pop_{front,back} helpers don't need to check owner
    before removing as the node-to-be-removed is not passed in as input and
    is instead taken directly from the list. Do the check anyways and
    WARN_ON_ONCE in this unexpected scenario.

    Selftest changes in this patch are entirely mechanical: some BTF
    tests have hardcoded struct sizes for structs that contain
    bpf_{list,rb}_node fields, those were adjusted to account for the new
    sizes. Selftest additions to validate the owner field are added in a
    further patch in the series.

      [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:52 +01:00
Jerome Marchand 5c3878aeda bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 0a1f7bfe35a3e1302529fa900bf0574a5dfc8ea6
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Jul 18 01:38:09 2023 -0700

    bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node

    Structs bpf_rb_node and bpf_list_node are opaquely defined in
    uapi/linux/bpf.h, as BPF program writers are not expected to touch their
    fields - nor does the verifier allow them to do so.

    Currently these structs are simple wrappers around structs rb_node and
    list_head and linked_list / rbtree implementation just casts and passes
    to library functions for those data structures. Later patches in this
    series, though, will add an "owner" field to bpf_{rb,list}_node, such
    that they're not just wrapping an underlying node type. Moreover, the
    bpf linked_list and rbtree implementations will deal with these owner
    pointers directly in a few different places.

    To avoid having to do

      void *owner = (void*)bpf_list_node + sizeof(struct list_head)

    with opaque UAPI node types, add bpf_{list,rb}_node_kern struct
    definitions to internal headers and modify linked_list and rbtree to use
    the internal types where appropriate.

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:52 +01:00
Viktor Malik 97afa4db94
bpf: Fix missed rcu read lock in bpf_task_under_cgroup()
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 29a7e00ffadddd8d68eff311de1bf12ae10687bb
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sat Oct 7 13:59:44 2023 +0000

    bpf: Fix missed rcu read lock in bpf_task_under_cgroup()

    When employed within a sleepable program not under RCU protection, the
    use of 'bpf_task_under_cgroup()' may trigger a warning in the kernel log,
    particularly when CONFIG_PROVE_RCU is enabled:

      [ 1259.662357] WARNING: suspicious RCU usage
      [ 1259.662358] 6.5.0+ #33 Not tainted
      [ 1259.662360] -----------------------------
      [ 1259.662361] include/linux/cgroup.h:423 suspicious rcu_dereference_check() usage!

    Other info that might help to debug this:

      [ 1259.662366] rcu_scheduler_active = 2, debug_locks = 1
      [ 1259.662368] 1 lock held by trace/72954:
      [ 1259.662369]  #0: ffffffffb5e3eda0 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xb0

    Stack backtrace:

      [ 1259.662385] CPU: 50 PID: 72954 Comm: trace Kdump: loaded Not tainted 6.5.0+ #33
      [ 1259.662391] Call Trace:
      [ 1259.662393]  <TASK>
      [ 1259.662395]  dump_stack_lvl+0x6e/0x90
      [ 1259.662401]  dump_stack+0x10/0x20
      [ 1259.662404]  lockdep_rcu_suspicious+0x163/0x1b0
      [ 1259.662412]  task_css_set.part.0+0x23/0x30
      [ 1259.662417]  bpf_task_under_cgroup+0xe7/0xf0
      [ 1259.662422]  bpf_prog_7fffba481a3bcf88_lsm_run+0x5c/0x93
      [ 1259.662431]  bpf_trampoline_6442505574+0x60/0x1000
      [ 1259.662439]  bpf_lsm_bpf+0x5/0x20
      [ 1259.662443]  ? security_bpf+0x32/0x50
      [ 1259.662452]  __sys_bpf+0xe6/0xdd0
      [ 1259.662463]  __x64_sys_bpf+0x1a/0x30
      [ 1259.662467]  do_syscall_64+0x38/0x90
      [ 1259.662472]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [ 1259.662479] RIP: 0033:0x7f487baf8e29
      [...]
      [ 1259.662504]  </TASK>

    This issue can be reproduced by executing a straightforward program, as
    demonstrated below:

    SEC("lsm.s/bpf")
    int BPF_PROG(lsm_run, int cmd, union bpf_attr *attr, unsigned int size)
    {
            struct cgroup *cgrp = NULL;
            struct task_struct *task;
            int ret = 0;

            if (cmd != BPF_LINK_CREATE)
                    return 0;

            // The cgroup2 should be mounted first
            cgrp = bpf_cgroup_from_id(1);
            if (!cgrp)
                    goto out;
            task = bpf_get_current_task_btf();
            if (bpf_task_under_cgroup(task, cgrp))
                    ret = -1;
            bpf_cgroup_release(cgrp);

    out:
            return ret;
    }

    After running the program, if you subsequently execute another BPF program,
    you will encounter the warning.

    It's worth noting that task_under_cgroup_hierarchy() is also utilized by
    bpf_current_task_under_cgroup(). However, bpf_current_task_under_cgroup()
    doesn't exhibit this issue because it cannot be used in sleepable BPF
    programs.

    Fixes: b5ad4cdc46c7 ("bpf: Add bpf_task_under_cgroup() kfunc")
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Cc: Feng Zhou <zhoufeng.zf@bytedance.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Link: https://lore.kernel.org/bpf/20231007135945.4306-1-laoar.shao@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-11-02 08:05:13 +01:00
Viktor Malik a80d6f78ca
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
JIRA: https://issues.redhat.com/browse/RHEL-9957

Conflicts: changed context due to already backported upstream commit
           5426700e6841 ("bpf: fix bpf_dynptr_slice() to stop return an
           ERR_PTR.")

commit 3bda08b63670c39be390fcb00e7718775508e673
Author: Daniel Rosenberg <drosen@google.com>
Date:   Fri May 5 18:31:30 2023 -0700

    bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)

    bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
    a pointer to a block of contiguous memory. This buffer is unused in the
    case of local dynptrs, and may be unused in other cases as well. There
    is no need to require the buffer, as the kfunc can just return NULL if
    it was needed and not provided.

    This adds another kfunc annotation, __opt, which combines with __sz and
    __szk to allow the buffer associated with the size to be NULL. If the
    buffer is NULL, the verifier does not check that the buffer is of
    sufficient size.

    Signed-off-by: Daniel Rosenberg <drosen@google.com>
    Link: https://lore.kernel.org/r/20230506013134.2492210-2-drosen@google.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-12 11:40:58 +02:00
Viktor Malik 718d83faac
bpf: Add bpf_task_under_cgroup() kfunc
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit b5ad4cdc46c7d6e7f8d2c9e24b6c9a1edec95154
Author: Feng Zhou <zhoufeng.zf@bytedance.com>
Date:   Sat May 6 11:15:44 2023 +0800

    bpf: Add bpf_task_under_cgroup() kfunc
    
    Add a kfunc that's similar to the bpf_current_task_under_cgroup.
    The difference is that it is a designated task.
    
    When hook sched related functions, sometimes it is necessary to
    specify a task instead of the current task.
    
    Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20230506031545.35991-2-zhoufeng.zf@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-12 11:40:58 +02:00
Viktor Malik 03f1a4fd65
bpf: Add bpf_dynptr_clone
JIRA: https://issues.redhat.com/browse/RHEL-9957

Conflicts: changed context due to already backported
           7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
           non-owning refs")

commit 361f129f3cc185af6667aca0bec0be9a020a8abc
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Apr 20 00:14:13 2023 -0700

    bpf: Add bpf_dynptr_clone

    The cloned dynptr will point to the same data as its parent dynptr,
    with the same type, offset, size and read-only properties.

    Any writes to a dynptr will be reflected across all instances
    (by 'instance', this means any dynptrs that point to the same
    underlying data).

    Please note that data slice and dynptr invalidations will affect all
    instances as well. For example, if bpf_dynptr_write() is called on an
    skb-type dynptr, all data slices of dynptr instances to that skb
    will be invalidated as well (eg data slices of any clones, parents,
    grandparents, ...). Another example is if a ringbuf dynptr is submitted,
    any instance of that dynptr will be invalidated.

    Changing the view of the dynptr (eg advancing the offset or
    trimming the size) will only affect that dynptr and not affect any
    other instances.

    One example use case where cloning may be helpful is for hashing or
    iterating through dynptr data. Cloning will allow the user to maintain
    the original view of the dynptr for future use, while also allowing
    views to smaller subsets of the data after the offset is advanced or the
    size is trimmed.

    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20230420071414.570108-5-joannelkoong@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-12 11:40:50 +02:00
Viktor Malik effda92542 bpf: Add bpf_dynptr_size
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 26662d7347a058ca497792c4b22ac91cc415cbf6
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Apr 20 00:14:12 2023 -0700

    bpf: Add bpf_dynptr_size
    
    bpf_dynptr_size returns the number of usable bytes in a dynptr.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20230420071414.570108-4-joannelkoong@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-11 12:51:42 +02:00
Viktor Malik 92d2d2a1ff bpf: Add bpf_dynptr_is_null and bpf_dynptr_is_rdonly
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 540ccf96ddbc173474c32e595787d5622253be3d
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Apr 20 00:14:11 2023 -0700

    bpf: Add bpf_dynptr_is_null and bpf_dynptr_is_rdonly
    
    bpf_dynptr_is_null returns true if the dynptr is null / invalid
    (determined by whether ptr->data is NULL), else false if
    the dynptr is a valid dynptr.
    
    bpf_dynptr_is_rdonly returns true if the dynptr is read-only,
    else false if the dynptr is read-writable. If the dynptr is
    null / invalid, false is returned by default.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20230420071414.570108-3-joannelkoong@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-11 12:51:42 +02:00
Viktor Malik 4ba20bfa6c bpf: Add bpf_dynptr_adjust
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 987d0242d189661f78b77cc4d77f843b15600fed
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Apr 20 00:14:10 2023 -0700

    bpf: Add bpf_dynptr_adjust
    
    Add a new kfunc
    
    int bpf_dynptr_adjust(struct bpf_dynptr_kern *ptr, u32 start, u32 end);
    
    which adjusts the dynptr to reflect the new [start, end) interval.
    In particular, it advances the offset of the dynptr by "start" bytes,
    and if end is less than the size of the dynptr, then this will trim the
    dynptr accordingly.
    
    Adjusting the dynptr interval may be useful in certain situations.
    For example, when hashing which takes in generic dynptrs, if the dynptr
    points to a struct but only a certain memory region inside the struct
    should be hashed, adjust can be used to narrow in on the
    specific region to hash.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20230420071414.570108-2-joannelkoong@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-11 12:51:41 +02:00
Artem Savkov ba39234e6f bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR.
Bugzilla: https://bugzilla.redhat.com/2221599

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 5426700e6841bf72e652e34b5cec68eadf442435
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Thu Aug 3 16:12:06 2023 -0700

    bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR.

    Verify if the pointer obtained from bpf_xdp_pointer() is either an error or
    NULL before returning it.

    The function bpf_dynptr_slice() mistakenly returned an ERR_PTR. Instead of
    solely checking for NULL, it should also verify if the pointer returned by
    bpf_xdp_pointer() is an error or NULL.

    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Closes: https://lore.kernel.org/bpf/d1360219-85c3-4a03-9449-253ea905f9d1@moroto.mountain/
    Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr")
    Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230803231206.1060485-1-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:37 +02:00
Artem Savkov ea2e6dc001 bpf: Make bpf_refcount_acquire fallible for non-owning refs
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: missing 361f129f3cc18 "bpf: Add bpf_dynptr_clone"

commit 7793fc3babe9fea908e57f7c187ea819f9fd7e95
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Thu Jun 1 19:26:42 2023 -0700

    bpf: Make bpf_refcount_acquire fallible for non-owning refs

    This patch fixes an incorrect assumption made in the original
    bpf_refcount series [0], specifically that the BPF program calling
    bpf_refcount_acquire on some node can always guarantee that the node is
    alive. In that series, the patch adding failure behavior to rbtree_add
    and list_push_{front, back} breaks this assumption for non-owning
    references.

    Consider the following program:

      n = bpf_kptr_xchg(&mapval, NULL);
      /* skip error checking */

      bpf_spin_lock(&l);
      if(bpf_rbtree_add(&t, &n->rb, less)) {
        bpf_refcount_acquire(n);
        /* Failed to add, do something else with the node */
      }
      bpf_spin_unlock(&l);

    It's incorrect to assume that bpf_refcount_acquire will always succeed in this
    scenario. bpf_refcount_acquire is being called in a critical section
    here, but the lock being held is associated with rbtree t, which isn't
    necessarily the lock associated with the tree that the node is already
    in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
    in it, the program has no ownership of the node's lifetime. Therefore
    the node's refcount can be decr'd to 0 at any time after the failing
    rbtree_add. If this happens before the refcount_acquire above, the node
    might be free'd, and regardless refcount_acquire will be incrementing a
    0 refcount.

    Later patches in the series exercise this scenario, resulting in the
    expected complaint from the kernel (without this patch's changes):

      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
      Modules linked in: bpf_testmod(O)
      CPU: 1 PID: 207 Comm: test_progs Tainted: G           O       6.3.0-rc7-02231-g723de1a718a2-dirty #371
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xbc/0x110
      Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
      RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
      RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
      RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
      R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
      R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
      FS:  00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       bpf_refcount_acquire_impl+0xb5/0xc0

      (rest of output snipped)

    The patch addresses this by changing bpf_refcount_acquire_impl to use
    refcount_inc_not_zero instead of refcount_inc and marking
    bpf_refcount_acquire KF_RET_NULL.

    For owning references, though, we know the above scenario is not possible
    and thus that bpf_refcount_acquire will always succeed. Some verifier
    bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
    calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
    owning refs despite it being marked KF_RET_NULL.

    Existing selftests using bpf_refcount_acquire are modified where
    necessary to NULL-check its return value.

      [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/

    Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
    Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230602022647.1571784-5-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:36 +02:00
Artem Savkov 6decc52187 bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation
Bugzilla: https://bugzilla.redhat.com/2221599

commit cc0d76cafebbd3e1ffab9c4252d48ecc9e0737f6
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Thu Jun 1 19:26:41 2023 -0700

    bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation
    
    Given the pointer to struct bpf_{rb,list}_node within a local kptr and
    the byte offset of that field within the kptr struct, the calculation changed
    by this patch is meant to find the beginning of the kptr so that it can
    be passed to bpf_obj_drop.
    
    Unfortunately instead of doing
    
      ptr_to_kptr = ptr_to_node_field - offset_bytes
    
    the calculation is erroneously doing
    
      ptr_to_ktpr = ptr_to_node_field - (offset_bytes * sizeof(struct bpf_rb_node))
    
    or the bpf_list_node equivalent.
    
    This patch fixes the calculation.
    
    Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230602022647.1571784-4-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:36 +02:00
Artem Savkov bdd13b93b0 bpf: Centralize btf_field-specific initialization logic
Bugzilla: https://bugzilla.redhat.com/2221599

commit 3e81740a90626024a9d9c6f9bfa3d36204dafefb
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:10 2023 -0700

    bpf: Centralize btf_field-specific initialization logic
    
    All btf_fields in an object are 0-initialized by memset in
    bpf_obj_init. This might not be a valid initial state for some field
    types, in which case kfuncs that use the type will properly initialize
    their input if it's been 0-initialized. Some BPF graph collection types
    and kfuncs do this: bpf_list_{head,node} and bpf_rb_node.
    
    An earlier patch in this series added the bpf_refcount field, for which
    the 0 state indicates that the refcounted object should be free'd.
    bpf_obj_init treats this field specially, setting refcount to 1 instead
    of relying on scattered "refcount is 0? Must have just been initialized,
    let's set to 1" logic in kfuncs.
    
    This patch extends this treatment to list and rbtree field types,
    allowing most scattered initialization logic in kfuncs to be removed.
    
    Note that bpf_{list_head,rb_root} may be inside a BPF map, in which case
    they'll be 0-initialized without passing through the newly-added logic,
    so scattered initialization logic must remain for these collection root
    types.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-9-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:36 +02:00
Artem Savkov 4ea0201da9 bpf: Support refcounted local kptrs in existing semantics
Bugzilla: https://bugzilla.redhat.com/2221599

commit 1512217c47f0e8ea076dd0e67262e5a668a78f01
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:05 2023 -0700

    bpf: Support refcounted local kptrs in existing semantics
    
    A local kptr is considered 'refcounted' when it is of a type that has a
    bpf_refcount field. When such a kptr is created, its refcount should be
    initialized to 1; when destroyed, the object should be free'd only if a
    refcount decr results in 0 refcount.
    
    Existing logic always frees the underlying memory when destroying a
    local kptr, and 0-initializes all btf_record fields. This patch adds
    checks for "is local kptr refcounted?" and new logic for that case in
    the appropriate places.
    
    This patch focuses on changing existing semantics and thus conspicuously
    does _not_ provide a way for BPF programs in increment refcount. That
    follows later in the series.
    
    __bpf_obj_drop_impl is modified to do the right thing when it sees a
    refcounted type. Container types for graph nodes (list, tree, stashed in
    map) are migrated to use __bpf_obj_drop_impl as a destructor for their
    nodes instead of each having custom destruction code in their _free
    paths. Now that "drop" isn't a synonym for "free" when the type is
    refcounted it makes sense to centralize this logic.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-4-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:36 +02:00
Artem Savkov a5041696eb bpf: Fix bpf_refcount_acquire's refcount_t address calculation
Bugzilla: https://bugzilla.redhat.com/2221599

commit 4ab07209d5cc8cb6d2a5324c07b3efc3b2fde494
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Fri Apr 21 00:44:31 2023 -0700

    bpf: Fix bpf_refcount_acquire's refcount_t address calculation
    
    When calculating the address of the refcount_t struct within a local
    kptr, bpf_refcount_acquire_impl should add refcount_off bytes to the
    address of the local kptr. Due to some missing parens, the function is
    incorrectly adding sizeof(refcount_t) * refcount_off bytes. This patch
    fixes the calculation.
    
    Due to the incorrect calculation, bpf_refcount_acquire_impl was trying
    to refcount_inc some memory well past the end of local kptrs, resulting
    in kasan and refcount complaints, as reported in [0]. In that thread,
    Florian and Eduard discovered that bpf selftests written in the new
    style - with __success and an expected __retval, specifically - were
    not actually being run. As a result, selftests added in bpf_refcount
    series weren't really exercising this behavior, and thus didn't unearth
    the bug.
    
    With this fixed behavior it's safe to revert commit 7c4b96c00043
    ("selftests/bpf: disable program test run for progs/refcounted_kptr.c"),
    this patch does so.
    
      [0] https://lore.kernel.org/bpf/ZEEp+j22imoN6rn9@strlen.de/
    
    Fixes: 7c50b1cb76ac ("bpf: Add bpf_refcount_acquire kfunc")
    Reported-by: Florian Westphal <fw@strlen.de>
    Reported-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/bpf/20230421074431.3548349-1-davemarchevsky@fb.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:32 +02:00
Artem Savkov d14ef31f8e bpf: Migrate bpf_rbtree_remove to possibly fail
Bugzilla: https://bugzilla.redhat.com/2221599

commit 404ad75a36fb1a1008e9fe803aa7d0212df9e240
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:09 2023 -0700

    bpf: Migrate bpf_rbtree_remove to possibly fail
    
    This patch modifies bpf_rbtree_remove to account for possible failure
    due to the input rb_node already not being in any collection.
    The function can now return NULL, and does when the aforementioned
    scenario occurs. As before, on successful removal an owning reference to
    the removed node is returned.
    
    Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL |
    KF_ACQUIRE - provides the desired verifier semantics:
    
      * retval must be checked for NULL before use
      * if NULL, retval's ref_obj_id is released
      * retval is a "maybe acquired" owning ref, not a non-owning ref,
        so it will live past end of critical section (bpf_spin_unlock), and
        thus can be checked for NULL after the end of the CS
    
    BPF programs must add checks
    ============================
    
    This does change bpf_rbtree_remove's verifier behavior. BPF program
    writers will need to add NULL checks to their programs, but the
    resulting UX looks natural:
    
      bpf_spin_lock(&glock);
    
      n = bpf_rbtree_first(&ghead);
      if (!n) { /* ... */}
      res = bpf_rbtree_remove(&ghead, &n->node);
    
      bpf_spin_unlock(&glock);
    
      if (!res)  /* Newly-added check after this patch */
        return 1;
    
      n = container_of(res, /* ... */);
      /* Do something else with n */
      bpf_obj_drop(n);
      return 0;
    
    The "if (!res)" check above is the only addition necessary for the above
    program to pass verification after this patch.
    
    bpf_rbtree_remove no longer clobbers non-owning refs
    ====================================================
    
    An issue arises when bpf_rbtree_remove fails, though. Consider this
    example:
    
      struct node_data {
        long key;
        struct bpf_list_node l;
        struct bpf_rb_node r;
        struct bpf_refcount ref;
      };
    
      long failed_sum;
    
      void bpf_prog()
      {
        struct node_data *n = bpf_obj_new(/* ... */);
        struct bpf_rb_node *res;
        n->key = 10;
    
        bpf_spin_lock(&glock);
    
        bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */
        res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */);
        if (!res)
          failed_sum += n->key;  /* not possible */
    
        bpf_spin_unlock(&glock);
        /* if (res) { do something useful and drop } ... */
      }
    
    The bpf_rbtree_remove in this example will always fail. Similarly to
    bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference
    invalidation point. The verifier clobbers all non-owning refs after a
    bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail
    verification, and in fact there's no good way to get information about
    the node which failed to add after the invalidation. This patch removes
    non-owning reference invalidation from bpf_rbtree_remove to allow the
    above usecase to pass verification. The logic for why this is now
    possible is as follows:
    
    Before this series, bpf_rbtree_add couldn't fail and thus assumed that
    its input, a non-owning reference, was in the tree. But it's easy to
    construct an example where two non-owning references pointing to the same
    underlying memory are acquired and passed to rbtree_remove one after
    another (see rbtree_api_release_aliasing in
    selftests/bpf/progs/rbtree_fail.c).
    
    So it was necessary to clobber non-owning refs to prevent this
    case and, more generally, to enforce "non-owning ref is definitely
    in some collection" invariant. This series removes that invariant and
    the failure / runtime checking added in this patch provide a clean way
    to deal with the aliasing issue - just fail to remove.
    
    Because the aliasing issue prevented by clobbering non-owning refs is no
    longer an issue, this patch removes the invalidate_non_owning_refs
    call from verifier handling of bpf_rbtree_remove. Note that
    bpf_spin_unlock - the other caller of invalidate_non_owning_refs -
    clobbers non-owning refs for a different reason, so its clobbering
    behavior remains unchanged.
    
    No BPF program changes are necessary for programs to remain valid as a
    result of this clobbering change. A valid program before this patch
    passed verification with its non-owning refs having shorter (or equal)
    lifetimes due to more aggressive clobbering.
    
    Also, update existing tests to check bpf_rbtree_remove retval for NULL
    where necessary, and move rbtree_api_release_aliasing from
    progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass
    verification.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-8-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:30 +02:00
Artem Savkov 50d1f86f39 bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail
Bugzilla: https://bugzilla.redhat.com/2221599

commit d2dcc67df910dd85253a701b6a5b747f955d28f5
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:07 2023 -0700

    bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail
    
    Consider this code snippet:
    
      struct node {
        long key;
        bpf_list_node l;
        bpf_rb_node r;
        bpf_refcount ref;
      }
    
      int some_bpf_prog(void *ctx)
      {
        struct node *n = bpf_obj_new(/*...*/), *m;
    
        bpf_spin_lock(&glock);
    
        bpf_rbtree_add(&some_tree, &n->r, /* ... */);
        m = bpf_refcount_acquire(n);
        bpf_rbtree_add(&other_tree, &m->r, /* ... */);
    
        bpf_spin_unlock(&glock);
    
        /* ... */
      }
    
    After bpf_refcount_acquire, n and m point to the same underlying memory,
    and that node's bpf_rb_node field is being used by the some_tree insert,
    so overwriting it as a result of the second insert is an error. In order
    to properly support refcounted nodes, the rbtree and list insert
    functions must be allowed to fail. This patch adds such support.
    
    The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to
    return an int indicating success/failure, with 0 -> success, nonzero ->
    failure.
    
    bpf_obj_drop on failure
    =======================
    
    Currently the only reason an insert can fail is the example above: the
    bpf_{list,rb}_node is already in use. When such a failure occurs, the
    insert kfuncs will bpf_obj_drop the input node. This allows the insert
    operations to logically fail without changing their verifier owning ref
    behavior, namely the unconditional release_reference of the input
    owning ref.
    
    With insert that always succeeds, ownership of the node is always passed
    to the collection, since the node always ends up in the collection.
    
    With a possibly-failed insert w/ bpf_obj_drop, ownership of the node
    is always passed either to the collection (success), or to bpf_obj_drop
    (failure). Regardless, it's correct to continue unconditionally
    releasing the input owning ref, as something is always taking ownership
    from the calling program on insert.
    
    Keeping owning ref behavior unchanged results in a nice default UX for
    insert functions that can fail. If the program's reaction to a failed
    insert is "fine, just get rid of this owning ref for me and let me go
    on with my business", then there's no reason to check for failure since
    that's default behavior. e.g.:
    
      long important_failures = 0;
    
      int some_bpf_prog(void *ctx)
      {
        struct node *n, *m, *o; /* all bpf_obj_new'd */
    
        bpf_spin_lock(&glock);
        bpf_rbtree_add(&some_tree, &n->node, /* ... */);
        bpf_rbtree_add(&some_tree, &m->node, /* ... */);
        if (bpf_rbtree_add(&some_tree, &o->node, /* ... */)) {
          important_failures++;
        }
        bpf_spin_unlock(&glock);
      }
    
    If we instead chose to pass ownership back to the program on failed
    insert - by returning NULL on success or an owning ref on failure -
    programs would always have to do something with the returned ref on
    failure. The most likely action is probably "I'll just get rid of this
    owning ref and go about my business", which ideally would look like:
    
      if (n = bpf_rbtree_add(&some_tree, &n->node, /* ... */))
        bpf_obj_drop(n);
    
    But bpf_obj_drop isn't allowed in a critical section and inserts must
    occur within one, so in reality error handling would become a
    hard-to-parse mess.
    
    For refcounted nodes, we can replicate the "pass ownership back to
    program on failure" logic with this patch's semantics, albeit in an ugly
    way:
    
      struct node *n = bpf_obj_new(/* ... */), *m;
    
      bpf_spin_lock(&glock);
    
      m = bpf_refcount_acquire(n);
      if (bpf_rbtree_add(&some_tree, &n->node, /* ... */)) {
        /* Do something with m */
      }
    
      bpf_spin_unlock(&glock);
      bpf_obj_drop(m);
    
    bpf_refcount_acquire is used to simulate "return owning ref on failure".
    This should be an uncommon occurrence, though.
    
    Addition of two verifier-fixup'd args to collection inserts
    ===========================================================
    
    The actual bpf_obj_drop kfunc is
    bpf_obj_drop_impl(void *, struct btf_struct_meta *), with bpf_obj_drop
    macro populating the second arg with 0 and the verifier later filling in
    the arg during insn fixup.
    
    Because bpf_rbtree_add and bpf_list_push_{front,back} now might do
    bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be
    passed to bpf_obj_drop_impl.
    
    Similarly, because the 'node' param to those insert functions is the
    bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a
    pointer to the beginning of the node, the insert functions need to be
    able to find the beginning of the node struct. A second
    verifier-populated param is necessary: the offset of {list,rb}_node within the
    node type.
    
    These two new params allow the insert kfuncs to correctly call
    __bpf_obj_drop_impl:
    
      beginning_of_node = bpf_rb_node_ptr - offset
      if (already_inserted)
        __bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record);
    
    Similarly to other kfuncs with "hidden" verifier-populated params, the
    insert functions are renamed with _impl prefix and a macro is provided
    for common usage. For example, bpf_rbtree_add kfunc is now
    bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets
    "hidden" args to 0.
    
    Due to the two new args BPF progs will need to be recompiled to work
    with the new _impl kfuncs.
    
    This patch also rewrites the "hidden argument" explanation to more
    directly say why the BPF program writer doesn't need to populate the
    arguments with anything meaningful.
    
    How does this new logic affect non-owning references?
    =====================================================
    
    Currently, non-owning refs are valid until the end of the critical
    section in which they're created. We can make this guarantee because, if
    a non-owning ref exists, the referent was added to some collection. The
    collection will drop() its nodes when it goes away, but it can't go away
    while our program is accessing it, so that's not a problem. If the
    referent is removed from the collection in the same CS that it was added
    in, it can't be bpf_obj_drop'd until after CS end. Those are the only
    two ways to free the referent's memory and neither can happen until
    after the non-owning ref's lifetime ends.
    
    On first glance, having these collection insert functions potentially
    bpf_obj_drop their input seems like it breaks the "can't be
    bpf_obj_drop'd until after CS end" line of reasoning. But we care about
    the memory not being _freed_ until end of CS end, and a previous patch
    in the series modified bpf_obj_drop such that it doesn't free refcounted
    nodes until refcount == 0. So the statement can be more accurately
    rewritten as "can't be free'd until after CS end".
    
    We can prove that this rewritten statement holds for any non-owning
    reference produced by collection insert functions:
    
    * If the input to the insert function is _not_ refcounted
      * We have an owning reference to the input, and can conclude it isn't
        in any collection
        * Inserting a node in a collection turns owning refs into
          non-owning, and since our input type isn't refcounted, there's no
          way to obtain additional owning refs to the same underlying
          memory
      * Because our node isn't in any collection, the insert operation
        cannot fail, so bpf_obj_drop will not execute
      * If bpf_obj_drop is guaranteed not to execute, there's no risk of
        memory being free'd
    
    * Otherwise, the input to the insert function is refcounted
      * If the insert operation fails due to the node's list_head or rb_root
        already being in some collection, there was some previous successful
        insert which passed refcount to the collection
      * We have an owning reference to the input, it must have been
        acquired via bpf_refcount_acquire, which bumped the refcount
      * refcount must be >= 2 since there's a valid owning reference and the
        node is already in a collection
      * Insert triggering bpf_obj_drop will decr refcount to >= 1, never
        resulting in a free
    
    So although we may do bpf_obj_drop during the critical section, this
    will never result in memory being free'd, and no changes to non-owning
    ref logic are needed in this patch.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-6-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:30 +02:00
Artem Savkov 63cb8b808f bpf: Add bpf_refcount_acquire kfunc
Bugzilla: https://bugzilla.redhat.com/2221599

commit 7c50b1cb76aca4540aa917db5f2a302acddcadff
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:06 2023 -0700

    bpf: Add bpf_refcount_acquire kfunc
    
    Currently, BPF programs can interact with the lifetime of refcounted
    local kptrs in the following ways:
    
      bpf_obj_new  - Initialize refcount to 1 as part of new object creation
      bpf_obj_drop - Decrement refcount and free object if it's 0
      collection add - Pass ownership to the collection. No change to
                       refcount but collection is responsible for
    		   bpf_obj_dropping it
    
    In order to be able to add a refcounted local kptr to multiple
    collections we need to be able to increment the refcount and acquire a
    new owning reference. This patch adds a kfunc, bpf_refcount_acquire,
    implementing such an operation.
    
    bpf_refcount_acquire takes a refcounted local kptr and returns a new
    owning reference to the same underlying memory as the input. The input
    can be either owning or non-owning. To reinforce why this is safe,
    consider the following code snippets:
    
      struct node *n = bpf_obj_new(typeof(*n)); // A
      struct node *m = bpf_refcount_acquire(n); // B
    
    In the above snippet, n will be alive with refcount=1 after (A), and
    since nothing changes that state before (B), it's obviously safe. If
    n is instead added to some rbtree, we can still safely refcount_acquire
    it:
    
      struct node *n = bpf_obj_new(typeof(*n));
      struct node *m;
    
      bpf_spin_lock(&glock);
      bpf_rbtree_add(&groot, &n->node, less);   // A
      m = bpf_refcount_acquire(n);              // B
      bpf_spin_unlock(&glock);
    
    In the above snippet, after (A) n is a non-owning reference, and after
    (B) m is an owning reference pointing to the same memory as n. Although
    n has no ownership of that memory's lifetime, it's guaranteed to be
    alive until the end of the critical section, and n would be clobbered if
    we were past the end of the critical section, so it's safe to bump
    refcount.
    
    Implementation details:
    
    * From verifier's perspective, bpf_refcount_acquire handling is similar
      to bpf_obj_new and bpf_obj_drop. Like the former, it returns a new
      owning reference matching input type, although like the latter, type
      can be inferred from concrete kptr input. Verifier changes in
      {check,fixup}_kfunc_call and check_kfunc_args are largely copied from
      aforementioned functions' verifier changes.
    
    * An exception to the above is the new KF_ARG_PTR_TO_REFCOUNTED_KPTR
      arg, indicated by new "__refcounted_kptr" kfunc arg suffix. This is
      necessary in order to handle both owning and non-owning input without
      adding special-casing to "__alloc" arg handling. Also a convenient
      place to confirm that input type has bpf_refcount field.
    
    * The implemented kfunc is actually bpf_refcount_acquire_impl, with
      'hidden' second arg that the verifier sets to the type's struct_meta
      in fixup_kfunc_call.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-5-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:30 +02:00
Artem Savkov 201f7b639f bpf: Remove btf_field_offs, use btf_record's fields instead
Bugzilla: https://bugzilla.redhat.com/2221599

commit cd2a8079014aced27da9b2e669784f31680f1351
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Apr 15 13:18:03 2023 -0700

    bpf: Remove btf_field_offs, use btf_record's fields instead
    
    The btf_field_offs struct contains (offset, size) for btf_record fields,
    sorted by offset. btf_field_offs is always used in conjunction with
    btf_record, which has btf_field 'fields' array with (offset, type), the
    latter of which btf_field_offs' size is derived from via
    btf_field_type_size.
    
    This patch adds a size field to struct btf_field and sorts btf_record's
    fields by offset, making it possible to get rid of btf_field_offs. Less
    data duplication and less code complexity results.
    
    Since btf_field_offs' lifetime closely followed the btf_record used to
    populate it, most complexity wins are from removal of initialization
    code like:
    
      if (btf_record_successfully_initialized) {
        foffs = btf_parse_field_offs(rec);
        if (IS_ERR_OR_NULL(foffs))
          // free the btf_record and return err
      }
    
    Other changes in this patch are pretty mechanical:
    
      * foffs->field_off[i] -> rec->fields[i].offset
      * foffs->field_sz[i] -> rec->fields[i].size
      * Sort rec->fields in btf_parse_fields before returning
        * It's possible that this is necessary independently of other
          changes in this patch. btf_record_find in syscall.c expects
          btf_record's fields to be sorted by offset, yet there's no
          explicit sorting of them before this patch, record's fields are
          populated in the order they're read from BTF struct definition.
          BTF docs don't say anything about the sortedness of struct fields.
      * All functions taking struct btf_field_offs * input now instead take
        struct btf_record *. All callsites of these functions already have
        access to the correct btf_record.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230415201811.343116-2-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:30 +02:00
Artem Savkov 5775ae2320 bpf: Remove bpf_cgroup_kptr_get() kfunc
Bugzilla: https://bugzilla.redhat.com/2221599

commit 6499fe6edc4fd5b91aed4d5cd84bd113e1c58d5f
Author: David Vernet <void@manifault.com>
Date:   Mon Apr 10 23:16:32 2023 -0500

    bpf: Remove bpf_cgroup_kptr_get() kfunc
    
    Now that bpf_cgroup_acquire() is KF_RCU | KF_RET_NULL,
    bpf_cgroup_kptr_get() is redundant. Let's remove it, and update
    selftests to instead use bpf_cgroup_acquire() where appropriate. The
    next patch will update the BPF documentation to not mention
    bpf_cgroup_kptr_get().
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230411041633.179404-2-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:29 +02:00
Artem Savkov 1adc3b46d3 bpf: Make bpf_cgroup_acquire() KF_RCU | KF_RET_NULL
Bugzilla: https://bugzilla.redhat.com/2221599

commit 1d71283987c729dceccce834a864c27301ba155e
Author: David Vernet <void@manifault.com>
Date:   Mon Apr 10 23:16:31 2023 -0500

    bpf: Make bpf_cgroup_acquire() KF_RCU | KF_RET_NULL
    
    struct cgroup is already an RCU-safe type in the verifier. We can
    therefore update bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and
    subsequently remove bpf_cgroup_kptr_get(). This patch does the first of
    these by updating bpf_cgroup_acquire() to be KF_RCU | KF_RET_NULL, and
    also updates selftests accordingly.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230411041633.179404-1-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:29 +02:00
Artem Savkov 395de7adfe bpf: ensure all memory is initialized in bpf_get_current_comm
Bugzilla: https://bugzilla.redhat.com/2221599

commit f3f21349779776135349a8e6f114a1485b2476b7
Author: Barret Rhoden <brho@google.com>
Date:   Thu Apr 6 20:18:08 2023 -0400

    bpf: ensure all memory is initialized in bpf_get_current_comm
    
    BPF helpers that take an ARG_PTR_TO_UNINIT_MEM must ensure that all of
    the memory is set, including beyond the end of the string.
    
    Signed-off-by: Barret Rhoden <brho@google.com>
    Link: https://lore.kernel.org/r/20230407001808.1622968-1-brho@google.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:27 +02:00
Artem Savkov f3f0e203f7 bpf: Remove now-defunct task kfuncs
Bugzilla: https://bugzilla.redhat.com/2221599

commit f85671c6ef46d490a90dac719e0c0e0adbacfd9b
Author: David Vernet <void@manifault.com>
Date:   Fri Mar 31 14:57:32 2023 -0500

    bpf: Remove now-defunct task kfuncs
    
    In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the
    'refcount_t rcu_users' field was extracted out of a union with the
    'struct rcu_head rcu' field. This allows us to safely perform a
    refcount_inc_not_zero() on task->rcu_users when acquiring a reference on
    a task struct. A prior patch leveraged this by making struct task_struct
    an RCU-protected object in the verifier, and by bpf_task_acquire() to
    use the task->rcu_users field for synchronization.
    
    Now that we can use RCU to protect tasks, we no longer need
    bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get()
    is truly completely unnecessary, as we can just use RCU to get the
    object. bpf_task_acquire_not_zero() is now equivalent to
    bpf_task_acquire().
    
    In addition to these changes, this patch also updates the associated
    selftests to no longer use these kfuncs.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:25 +02:00
Artem Savkov fa443189f2 bpf: Make struct task_struct an RCU-safe type
Bugzilla: https://bugzilla.redhat.com/2221599

commit d02c48fa113953aba0b330ec6c35f50c7d1d7986
Author: David Vernet <void@manifault.com>
Date:   Fri Mar 31 14:57:31 2023 -0500

    bpf: Make struct task_struct an RCU-safe type
    
    struct task_struct objects are a bit interesting in terms of how their
    lifetime is protected by refcounts. task structs have two refcount
    fields:
    
    1. refcount_t usage: Protects the memory backing the task struct. When
       this refcount drops to 0, the task is immediately freed, without
       waiting for an RCU grace period to elapse. This is the field that
       most callers in the kernel currently use to ensure that a task
       remains valid while it's being referenced, and is what's currently
       tracked with bpf_task_acquire() and bpf_task_release().
    
    2. refcount_t rcu_users: A refcount field which, when it drops to 0,
       schedules an RCU callback that drops a reference held on the 'usage'
       field above (which is acquired when the task is first created). This
       field therefore provides a form of RCU protection on the task by
       ensuring that at least one 'usage' refcount will be held until an RCU
       grace period has elapsed. The qualifier "a form of" is important
       here, as a task can remain valid after task->rcu_users has dropped to
       0 and the subsequent RCU gp has elapsed.
    
    In terms of BPF, we want to use task->rcu_users to protect tasks that
    function as referenced kptrs, and to allow tasks stored as referenced
    kptrs in maps to be accessed with RCU protection.
    
    Let's first determine whether we can safely use task->rcu_users to
    protect tasks stored in maps. All of the bpf_task* kfuncs can only be
    called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program
    types. For tracepoint and struct_ops programs, the struct task_struct
    passed to a program handler will always be trusted, so it will always be
    safe to call bpf_task_acquire() with any task passed to a program.
    Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL,
    as it is possible that the task has exited by the time the program is
    invoked, even if the pointer is still currently valid because the main
    kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks
    should never be passed as an argument to the any program handlers, so it
    should not be relevant.
    
    The second question is whether it's safe to use RCU to access a task
    that was acquired with bpf_task_acquire(), and stored in a map. Because
    bpf_task_acquire() now uses task->rcu_users, it follows that if the task
    is present in the map, that it must have had at least one
    task->rcu_users refcount by the time the current RCU cs was started.
    Therefore, it's safe to access that task until the end of the current
    RCU cs.
    
    With all that said, this patch makes struct task_struct is an
    RCU-protected object. In doing so, we also change bpf_task_acquire() to
    be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as
    necessary. A subsequent patch will remove bpf_task_kptr_get(), and
    bpf_task_acquire_not_zero() respectively.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:25 +02:00
Artem Savkov d5cf6bcc12 bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs
Bugzilla: https://bugzilla.redhat.com/2221599

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit fb2211a57c110b4ced3cb7f8570bd7246acf2d04
Author: David Vernet <void@manifault.com>
Date:   Sat Mar 25 16:31:45 2023 -0500

    bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs

    Now that we're not invoking kfunc destructors when the kptr in a map was
    NULL, we no longer require NULL checks in many of our KF_RELEASE kfuncs.
    This patch removes those NULL checks.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230325213144.486885-3-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:25 +02:00
Artem Savkov df29d79c95 bpf: Fix bpf_strncmp proto.
Bugzilla: https://bugzilla.redhat.com/2221599

commit c9267aa8b794c2188d49c7d7bd2990e98b2d6b84
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Mon Mar 13 16:58:43 2023 -0700

    bpf: Fix bpf_strncmp proto.
    
    bpf_strncmp() doesn't write into its first argument.
    Make sure that the verifier knows about it.
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230313235845.61029-2-alexei.starovoitov@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:16 +02:00
Artem Savkov 400701606d bpf: Support __kptr to local kptrs
Bugzilla: https://bugzilla.redhat.com/2221599

commit c8e18754091479fac3f5b6c053c6bc4be0b7fb11
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Fri Mar 10 15:07:41 2023 -0800

    bpf: Support __kptr to local kptrs
    
    If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module
    BTF - it must have been allocated by bpf_obj_new and therefore must be
    free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local
    kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new.
    
    This patch adds support for treating __kptr-tagged pointers to "local
    kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr
    acquire / release semantics. Consider the following example:
    
      struct node_data {
              long key;
              long data;
              struct bpf_rb_node node;
      };
    
      struct map_value {
              struct node_data __kptr *node;
      };
    
      struct {
              __uint(type, BPF_MAP_TYPE_ARRAY);
              __type(key, int);
              __type(value, struct map_value);
              __uint(max_entries, 1);
      } some_nodes SEC(".maps");
    
    If struct node_data had a matching definition in kernel BTF, the verifier would
    expect a destructor for the type to be registered. Since struct node_data does
    not match any type in kernel BTF, the verifier knows that there is no kfunc
    that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can
    only come from bpf_obj_new. So instead of searching for a registered dtor,
    a bpf_obj_drop dtor can be assumed.
    
    This allows the runtime to properly destruct such kptrs in
    bpf_obj_free_fields, which enables maps to clean up map_vals w/ such
    kptrs when going away.
    
    Implementation notes:
      * "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr.
        Before this patch, the variable would only ever point to vmlinux or
        module BTFs, but now it can point to some program BTF for local kptr
        type. It's later used to populate the (btf, btf_id) pair in kptr btf
        field.
      * It's necessary to btf_get the program BTF when populating btf_field
        for local kptr. btf_record_free later does a btf_put.
      * Behavior for non-local referenced kptrs is not modified, as
        bpf_find_btf_id helper only searches vmlinux and module BTFs for
        matching BTF type. If such a type is found, btf_field_kptr's btf will
        pass btf_is_kernel check, and the associated release function is
        some one-argument dtor. If btf_is_kernel check fails, associated
        release function is two-arg bpf_obj_drop_impl. Before this patch
        only btf_field_kptr's w/ kernel or module BTFs were created.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:16 +02:00
Artem Savkov 741fa9739c bpf: implement numbers iterator
Bugzilla: https://bugzilla.redhat.com/2221599

commit 6018e1f407cccf39b804d1f75ad4de7be4e6cc45
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Wed Mar 8 10:41:17 2023 -0800

    bpf: implement numbers iterator
    
    Implement the first open-coded iterator type over a range of integers.
    
    It's public API consists of:
      - bpf_iter_num_new() constructor, which accepts [start, end) range
        (that is, start is inclusive, end is exclusive).
      - bpf_iter_num_next() which will keep returning read-only pointer to int
        until the range is exhausted, at which point NULL will be returned.
        If bpf_iter_num_next() is kept calling after this, NULL will be
        persistently returned.
      - bpf_iter_num_destroy() destructor, which needs to be called at some
        point to clean up iterator state. BPF verifier enforces that iterator
        destructor is called at some point before BPF program exits.
    
    Note that `start = end = X` is a valid combination to setup an empty
    iterator. bpf_iter_num_new() will return 0 (success) for any such
    combination.
    
    If bpf_iter_num_new() detects invalid combination of input arguments, it
    returns error, resets iterator state to, effectively, empty iterator, so
    any subsequent call to bpf_iter_num_next() will keep returning NULL.
    
    BPF verifier has no knowledge that returned integers are in the
    [start, end) value range, as both `start` and `end` are not statically
    known and enforced: they are runtime values.
    
    While the implementation is pretty trivial, some care needs to be taken
    to avoid overflows and underflows. Subsequent selftests will validate
    correctness of [start, end) semantics, especially around extremes
    (INT_MIN and INT_MAX).
    
    Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
    be specified.
    
    bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
    BPF loops and bpf_loop() helper and is the basis for implementing
    ergonomic BPF loops with no statically known or verified bounds.
    Subsequent patches implement bpf_for() macro, demonstrating how this can
    be wrapped into something that works and feels like a normal for() loop
    in C language.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:14 +02:00
Artem Savkov 86812a862c bpf: Introduce kptr_rcu.
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: issing commit 7f203bc89eb6 ("cgroup: Replace
cgroup->ancestor_ids[] with ->ancestors[]")

commit 20c09d92faeefb8536f705d3a4629e0dc314c8a1
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Thu Mar 2 20:14:43 2023 -0800

    bpf: Introduce kptr_rcu.

    The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
    Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
    The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
    Derefrence of other kptr-s returns PTR_UNTRUSTED.

    For example:
    struct map_value {
       struct cgroup __kptr *cgrp;
    };

    SEC("tp_btf/cgroup_mkdir")
    int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
    {
      struct cgroup *cg, *cg2;

      cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
      bpf_kptr_xchg(&v->cgrp, cg);

      cg2 = v->cgrp; // This is new feature introduced by this patch.
      // cg2 is PTR_MAYBE_NULL | MEM_RCU.
      // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero

      if (cg2)
        bpf_cgroup_ancestor(cg2, level); // safe to do.
    }

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Tejun Heo <tj@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:10 +02:00
Artem Savkov 219ca146ad bpf: Add support for absolute value BPF timers
Bugzilla: https://bugzilla.redhat.com/2221599

commit f71f8530494bb5ab43d3369ef0ce8373eb1ee077
Author: Tero Kristo <tero.kristo@linux.intel.com>
Date:   Thu Mar 2 13:46:13 2023 +0200

    bpf: Add support for absolute value BPF timers
    
    Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
    to start an absolute value timer instead of the default relative value.
    This makes the timer expire at an exact point in time, instead of a time
    with latencies induced by both the BPF and timer subsystems.
    
    Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
    Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
    Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:10 +02:00
Artem Savkov 74f2bb6c6c bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
Bugzilla: https://bugzilla.redhat.com/2221599

commit c501bf55c88b834adefda870c7c092ec9052a437
Author: Tejun Heo <tj@kernel.org>
Date:   Thu Mar 2 09:42:59 2023 -1000

    bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
    
    These helpers are safe to call from any context and there's no reason to
    restrict access to them. Remove them from bpf_trace and filter lists and add
    to bpf_base_func_proto() under perfmon_capable().
    
    v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that
        they require bpf_capable() but not perfomon_capable() as it doesn't read
        from or affect others on the system.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:10 +02:00
Artem Savkov db1ae29d75 bpf: Fix bpf_dynptr_slice{_rdwr} to return NULL instead of 0
Bugzilla: https://bugzilla.redhat.com/2221599

commit c45eac537bd8b4977d335c123212140bc5257670
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Wed Mar 1 21:30:14 2023 -0800

    bpf: Fix bpf_dynptr_slice{_rdwr} to return NULL instead of 0
    
    Change bpf_dynptr_slice and bpf_dynptr_slice_rdwr to return NULL instead
    of 0, in accordance with the codebase guidelines.
    
    Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20230302053014.1726219-1-joannelkoong@gmail.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:09 +02:00
Artem Savkov d3a9527441 bpf: Fix doxygen comments for dynptr slice kfuncs
Bugzilla: https://bugzilla.redhat.com/2221599

commit 7ce60b110eece1d7b3d5c322fd11f6d41a29d17b
Author: David Vernet <void@manifault.com>
Date:   Wed Mar 1 13:49:09 2023 -0600

    bpf: Fix doxygen comments for dynptr slice kfuncs
    
    In commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and
    bpf_dynptr_slice_rdwr"), the bpf_dynptr_slice() and
    bpf_dynptr_slice_rdwr() kfuncs were added to BPF. These kfuncs included
    doxygen headers, but unfortunately those headers are not properly
    formatted according to [0], and causes the following warnings during the
    docs build:
    
    ./kernel/bpf/helpers.c:2225: warning: \
        Excess function parameter 'returns' description in 'bpf_dynptr_slice'
    ./kernel/bpf/helpers.c:2303: warning: \
        Excess function parameter 'returns' description in 'bpf_dynptr_slice_rdwr'
    ...
    
    This patch fixes those doxygen comments.
    
    [0]: https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
    
    Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr")
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230301194910.602738-1-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:09 +02:00
Artem Savkov fdc30fa851 bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Bugzilla: https://bugzilla.redhat.com/2221599

commit 66e3a13e7c2c44d0c9dd6bb244680ca7529a8845
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Wed Mar 1 07:49:52 2023 -0800

    bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
    
    Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
    The user must pass in a buffer to store the contents of the data slice
    if a direct pointer to the data cannot be obtained.
    
    For skb and xdp type dynptrs, these two APIs are the only way to obtain
    a data slice. However, for other types of dynptrs, there is no
    difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.
    
    For skb type dynptrs, the data is copied into the user provided buffer
    if any of the data is not in the linear portion of the skb. For xdp type
    dynptrs, the data is copied into the user provided buffer if the data is
    between xdp frags.
    
    If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
    the skb will be uncloned (see bpf_unclone_prologue()).
    
    Please note that any bpf_dynptr_write() automatically invalidates any prior
    data slices of the skb dynptr. This is because the skb may be cloned or
    may need to pull its paged buffer into the head. As such, any
    bpf_dynptr_write() will automatically have its prior data slices
    invalidated, even if the write is to data in the skb head of an uncloned
    skb. Please note as well that any other helper calls that change the
    underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
    slices of the skb dynptr as well, for the same reasons.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:08 +02:00
Artem Savkov 1869c0c1bd bpf: Add xdp dynptrs
Bugzilla: https://bugzilla.redhat.com/2221599

commit 05421aecd4ed65da0dc17b0c3c13779ef334e9e5
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Wed Mar 1 07:49:51 2023 -0800

    bpf: Add xdp dynptrs
    
    Add xdp dynptrs, which are dynptrs whose underlying pointer points
    to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
    benefits. One is that they allow operations on sizes that are not
    statically known at compile-time (eg variable-sized accesses).
    Another is that parsing the packet data through dynptrs (instead of
    through direct access of xdp->data and xdp->data_end) can be more
    ergonomic and less brittle (eg does not need manual if checking for
    being within bounds of data_end).
    
    For reads and writes on the dynptr, this includes reading/writing
    from/to and across fragments. Data slices through the bpf_dynptr_data
    API are not supported; instead bpf_dynptr_slice() and
    bpf_dynptr_slice_rdwr() should be used.
    
    For examples of how xdp dynptrs can be used, please see the attached
    selftests.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:08 +02:00
Artem Savkov 6735ca36be bpf: Add skb dynptrs
Bugzilla: https://bugzilla.redhat.com/2221599

commit b5964b968ac64c2ec2debee7518499113b27c34e
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Wed Mar 1 07:49:50 2023 -0800

    bpf: Add skb dynptrs
    
    Add skb dynptrs, which are dynptrs whose underlying pointer points
    to a skb. The dynptr acts on skb data. skb dynptrs have two main
    benefits. One is that they allow operations on sizes that are not
    statically known at compile-time (eg variable-sized accesses).
    Another is that parsing the packet data through dynptrs (instead of
    through direct access of skb->data and skb->data_end) can be more
    ergonomic and less brittle (eg does not need manual if checking for
    being within bounds of data_end).
    
    For bpf prog types that don't support writes on skb data, the dynptr is
    read-only (bpf_dynptr_write() will return an error)
    
    For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
    interfaces, reading and writing from/to data in the head as well as from/to
    non-linear paged buffers is supported. Data slices through the
    bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
    bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.
    
    For examples of how skb dynptrs can be used, please see the attached
    selftests.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:08 +02:00
Artem Savkov a1c7ecf835 bpf: Fix bpf_cgroup_from_id() doxygen header
Bugzilla: https://bugzilla.redhat.com/2221599

commit 30a2d8328d8ac1bb0a6bf73f4f4cf03f4f5977cc
Author: David Vernet <void@manifault.com>
Date:   Tue Feb 28 09:28:45 2023 -0600

    bpf: Fix bpf_cgroup_from_id() doxygen header
    
    In commit 332ea1f697be ("bpf: Add bpf_cgroup_from_id() kfunc"), a new
    bpf_cgroup_from_id() kfunc was added which allows a BPF program to
    lookup and acquire a reference to a cgroup from a cgroup id. The
    commit's doxygen comment seems to have copy-pasted fields, which causes
    BPF kfunc helper documentation to fail to render:
    
    <snip>/helpers.c:2114: warning: Excess function parameter 'cgrp'...
    <snip>/helpers.c:2114: warning: Excess function parameter 'level'...
    
    <snip>
    
    <snip>/helpers.c:2114: warning: Excess function parameter 'level'...
    
    This patch fixes the doxygen header.
    
    Fixes: 332ea1f697be ("bpf: Add bpf_cgroup_from_id() kfunc")
    Signed-off-by: David Vernet <void@manifault.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20230228152845.294695-1-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:08 +02:00
Artem Savkov 00bc00c8ec bpf: Add bpf_cgroup_from_id() kfunc
Bugzilla: https://bugzilla.redhat.com/2221599

commit 332ea1f697be148bd5e66475d82b5ecc5084da65
Author: Tejun Heo <tj@kernel.org>
Date:   Wed Feb 22 15:29:12 2023 -1000

    bpf: Add bpf_cgroup_from_id() kfunc
    
    cgroup ID is an userspace-visible 64bit value uniquely identifying a given
    cgroup. As the IDs are used widely, it's useful to be able to look up the
    matching cgroups. Add bpf_cgroup_from_id().
    
    v2: Separate out selftest into its own patch as suggested by Alexei.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/Y/bBaG96t0/gQl9/@slm.duckdns.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:07 +02:00
Viktor Malik b40d8b1fc1 bpf: Add bpf_rbtree_{add,remove,first} kfuncs
Bugzilla: https://bugzilla.redhat.com/2178930

commit bd1279ae8a691d7ec75852c6d0a22139afb034a4
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Mon Feb 13 16:40:11 2023 -0800

    bpf: Add bpf_rbtree_{add,remove,first} kfuncs
    
    This patch adds implementations of bpf_rbtree_{add,remove,first}
    and teaches verifier about their BTF_IDs as well as those of
    bpf_rb_{root,node}.
    
    All three kfuncs have some nonstandard component to their verification
    that needs to be addressed in future patches before programs can
    properly use them:
    
      * bpf_rbtree_add:     Takes 'less' callback, need to verify it
    
      * bpf_rbtree_first:   Returns ptr_to_node_type(off=rb_node_off) instead
                            of ptr_to_rb_node(off=0). Return value ref is
    			non-owning.
    
      * bpf_rbtree_remove:  Returns ptr_to_node_type(off=rb_node_off) instead
                            of ptr_to_rb_node(off=0). 2nd arg (node) is a
    			non-owning reference.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230214004017.2534011-3-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:29 +02:00
Viktor Malik 7e487d11fc bpf: Add basic bpf_rb_{root,node} support
Bugzilla: https://bugzilla.redhat.com/2178930

commit 9c395c1b99bd23f74bc628fa000480c49593d17f
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Mon Feb 13 16:40:10 2023 -0800

    bpf: Add basic bpf_rb_{root,node} support
    
    This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
    BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
    types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
    map_values.
    
    structs bpf_rb_root and bpf_rb_node are opaque types meant to
    obscure structs rb_root_cached rb_node, respectively.
    
    btf_struct_access will prevent BPF programs from touching these special
    fields automatically now that they're recognized.
    
    btf_check_and_fixup_fields now groups list_head and rb_root together as
    "graph root" fields and {list,rb}_node as "graph node", and does same
    ownership cycle checking as before. Note that this function does _not_
    prevent ownership type mixups (e.g. rb_root owning list_node) - that's
    handled by btf_parse_graph_root.
    
    After this patch, a bpf program can have a struct bpf_rb_root in a
    map_value, but not add anything to nor do anything useful with it.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:29 +02:00
Viktor Malik 23c9904275 bpf: Add __bpf_kfunc tag to all kfuncs
Bugzilla: https://bugzilla.redhat.com/2178930

commit 400031e05adfcef9e80eca80bdfc3f4b63658be4
Author: David Vernet <void@manifault.com>
Date:   Wed Feb 1 11:30:15 2023 -0600

    bpf: Add __bpf_kfunc tag to all kfuncs

    Now that we have the __bpf_kfunc tag, we should use add it to all
    existing kfuncs to ensure that they'll never be elided in LTO builds.

    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:20 +02:00
Viktor Malik 8a9358eb85 bpf: rename list_head -> graph_root in field info types
Bugzilla: https://bugzilla.redhat.com/2178930

commit 30465003ad776a922c32b2dac58db14f120f037e
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Sat Dec 17 00:24:57 2022 -0800

    bpf: rename list_head -> graph_root in field info types
    
    Many of the structs recently added to track field info for linked-list
    head are useful as-is for rbtree root. So let's do a mechanical renaming
    of list_head-related types and fields:
    
    include/linux/bpf.h:
      struct btf_field_list_head -> struct btf_field_graph_root
      list_head -> graph_root in struct btf_field union
    kernel/bpf/btf.c:
      list_head -> graph_root in struct btf_field_info
    
    This is a nonfunctional change, functionality to actually use these
    fields for rbtree will be added in further patches.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Link: https://lore.kernel.org/r/20221217082506.1570898-5-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:30 +02:00
Viktor Malik 1876dbfb9e bpf: Remove trace_printk_lock
Bugzilla: https://bugzilla.redhat.com/2178930

commit e2bb9e01d589f7fa82573aedd2765ff9b277816a
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:30 2022 +0100

    bpf: Remove trace_printk_lock
    
    Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded
    with trace_printk_lock spin lock.
    
    The spin lock contention causes issues with bpf programs attached to
    contention_begin tracepoint [1][2].
    
    Andrii suggested we could get rid of the contention by using trylock, but we
    could actually get rid of the spinlock completely by using percpu buffers the
    same way as for bin_args in bpf_bprintf_prepare function.
    
    Adding new return 'buf' argument to struct bpf_bprintf_data and making
    bpf_bprintf_prepare to return also the buffer for printk helpers.
    
      [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
      [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
    
    Reported-by: Hao Sun <sunhao.th@gmail.com>
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-4-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:23 +02:00
Viktor Malik 5741f9f020 bpf: Do cleanup in bpf_bprintf_cleanup only when needed
Bugzilla: https://bugzilla.redhat.com/2178930

commit f19a4050455aad847fb93f18dc1fe502eb60f989
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:29 2022 +0100

    bpf: Do cleanup in bpf_bprintf_cleanup only when needed
    
    Currently we always cleanup/decrement bpf_bprintf_nest_level variable
    in bpf_bprintf_cleanup if it's > 0.
    
    There's possible scenario where this could cause a problem, when
    bpf_bprintf_prepare does not get bin_args buffer (because num_args is 0)
    and following bpf_bprintf_cleanup call decrements bpf_bprintf_nest_level
    variable, like:
    
      in task context:
        bpf_bprintf_prepare(num_args != 0) increments 'bpf_bprintf_nest_level = 1'
        -> first irq :
           bpf_bprintf_prepare(num_args == 0)
           bpf_bprintf_cleanup decrements 'bpf_bprintf_nest_level = 0'
        -> second irq:
           bpf_bprintf_prepare(num_args != 0) bpf_bprintf_nest_level = 1
           gets same buffer as task context above
    
    Adding check to bpf_bprintf_cleanup and doing the real cleanup only if we
    got bin_args data in the first place.
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-3-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:23 +02:00
Viktor Malik e1af8144ba bpf: Add struct for bin_args arg in bpf_bprintf_prepare
Bugzilla: https://bugzilla.redhat.com/2178930

commit 78aa1cc9404399a15d2a1205329c6a06236f5378
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:28 2022 +0100

    bpf: Add struct for bin_args arg in bpf_bprintf_prepare
    
    Adding struct bpf_bprintf_data to hold bin_args argument for
    bpf_bprintf_prepare function.
    
    We will add another return argument to bpf_bprintf_prepare and
    pass the struct to bpf_bprintf_cleanup for proper cleanup in
    following changes.
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-2-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:22 +02:00
Jerome Marchand 0de8fcfc64 bpf: Use memmove for bpf_dynptr_{read,write}
Bugzilla: https://bugzilla.redhat.com/2177177

commit 76d16077bef0954528ec3896710f9eda8b2b4db1
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Thu Dec 8 02:11:40 2022 +0530

    bpf: Use memmove for bpf_dynptr_{read,write}
    
    It may happen that destination buffer memory overlaps with memory dynptr
    points to. Hence, we must use memmove to correctly copy from dynptr to
    destination buffer, or source buffer to dynptr.
    
    This actually isn't a problem right now, as memcpy implementation falls
    back to memmove on detecting overlap and warns about it, but we
    shouldn't be relying on that.
    
    Acked-by: Joanne Koong <joannelkoong@gmail.com>
    Acked-by: David Vernet <void@manifault.com>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221207204141.308952-7-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:17 +02:00
Jerome Marchand a59af7f5dd bpf: Rework process_dynptr_func
Bugzilla: https://bugzilla.redhat.com/2177177

commit 270605317366e4535d8d9fc3d9da1ad0fb3c9d45
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Thu Dec 8 02:11:37 2022 +0530

    bpf: Rework process_dynptr_func
    
    Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
    for use in callback state, because in case of user ringbuf helpers,
    there is no dynptr on the stack that is passed into the callback. To
    reflect such a state, a special register type was created.
    
    However, some checks have been bypassed incorrectly during the addition
    of this feature. First, for arg_type with MEM_UNINIT flag which
    initialize a dynptr, they must be rejected for such register type.
    Secondly, in the future, there are plans to add dynptr helpers that
    operate on the dynptr itself and may change its offset and other
    properties.
    
    In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
    to such helpers, however the current code simply returns 0.
    
    The rejection for helpers that release the dynptr is already handled.
    
    For fixing this, we take a step back and rework existing code in a way
    that will allow fitting in all classes of helpers and have a coherent
    model for dealing with the variety of use cases in which dynptr is used.
    
    First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
    with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
    
    Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
    fact. To make the distinction clear, use MEM_RDONLY flag to indicate
    that the helper only operates on the memory pointed to by the dynptr,
    not the dynptr itself. In C parlance, it would be equivalent to taking
    the dynptr as a point to const argument.
    
    When either of these flags are not present, the helper is allowed to
    mutate both the dynptr itself and also the memory it points to.
    Currently, the read only status of the memory is not tracked in the
    dynptr, but it would be trivial to add this support inside dynptr state
    of the register.
    
    With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
    better reflect its usage, it can no longer be passed to helpers that
    initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
    
    A note to reviewers is that in code that does mark_stack_slots_dynptr,
    and unmark_stack_slots_dynptr, we implicitly rely on the fact that
    PTR_TO_STACK reg is the only case that can reach that code path, as one
    cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
    both cases such helpers won't be setting that flag.
    
    The next patch will add a couple of selftest cases to make sure this
    doesn't break.
    
    Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper")
    Acked-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:17 +02:00
Jerome Marchand f19e7a99bb bpf/docs: Document struct cgroup * kfuncs
Bugzilla: https://bugzilla.redhat.com/2177177

commit 36aa10ffd6480b93e32611411be4a8fc49804aba
Author: David Vernet <void@manifault.com>
Date:   Wed Dec 7 14:49:11 2022 -0600

    bpf/docs: Document struct cgroup * kfuncs
    
    bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and
    bpf_cgroup_ancestor(), are kfuncs that were recently added to
    kernel/bpf/helpers.c. These are "core" kfuncs in that they're available
    for use in any tracepoint or struct_ops BPF program. Though they have no
    ABI stability guarantees, we should still document them. This patch adds
    a struct cgroup * subsection to the Core kfuncs section which describes
    each of these kfuncs.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221207204911.873646-3-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:16 +02:00
Jerome Marchand 27492209c8 bpf/docs: Document struct task_struct * kfuncs
Bugzilla: https://bugzilla.redhat.com/2177177

commit 25c5e92d197bd721e706444c5910fd386c330456
Author: David Vernet <void@manifault.com>
Date:   Wed Dec 7 14:49:10 2022 -0600

    bpf/docs: Document struct task_struct * kfuncs
    
    bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are
    kfuncs that were recently added to kernel/bpf/helpers.c. These are
    "core" kfuncs in that they're available for use for any tracepoint or
    struct_ops BPF program. Though they have no ABI stability guarantees, we
    should still document them. This patch adds a new Core kfuncs section to
    the BPF kfuncs doc, and adds entries for all of these task kfuncs.
    
    Note that bpf_task_kptr_get() is not documented, as it still returns
    NULL while we're working to resolve how it can use RCU to ensure struct
    task_struct * lifetime.
    
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221207204911.873646-2-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:16 +02:00
Jerome Marchand 091fd1e4dc bpf: Don't use rcu_users to refcount in task kfuncs
Bugzilla: https://bugzilla.redhat.com/2177177

commit 156ed20d22ee68d470232d26ae6df2cefacac4a0
Author: David Vernet <void@manifault.com>
Date:   Tue Dec 6 15:05:38 2022 -0600

    bpf: Don't use rcu_users to refcount in task kfuncs
    
    A series of prior patches added some kfuncs that allow struct
    task_struct * objects to be used as kptrs. These kfuncs leveraged the
    'refcount_t rcu_users' field of the task for performing refcounting.
    This field was used instead of 'refcount_t usage', as we wanted to
    leverage the safety provided by RCU for ensuring a task's lifetime.
    
    A struct task_struct is refcounted by two different refcount_t fields:
    
    1. p->usage:     The "true" refcount field which task lifetime. The
    		 task is freed as soon as this refcount drops to 0.
    
    2. p->rcu_users: An "RCU users" refcount field which is statically
    		 initialized to 2, and is co-located in a union with
    		 a struct rcu_head field (p->rcu). p->rcu_users
    		 essentially encapsulates a single p->usage
    		 refcount, and when p->rcu_users goes to 0, an RCU
    		 callback is scheduled on the struct rcu_head which
    		 decrements the p->usage refcount.
    
    Our logic was that by using p->rcu_users, we would be able to use RCU to
    safely issue refcount_inc_not_zero() a task's rcu_users field to
    determine if a task could still be acquired, or was exiting.
    Unfortunately, this does not work due to p->rcu_users and p->rcu sharing
    a union. When p->rcu_users goes to 0, an RCU callback is scheduled to
    drop a single p->usage refcount, and because the fields share a union,
    the refcount immediately becomes nonzero again after the callback is
    scheduled.
    
    If we were to split the fields out of the union, this wouldn't be a
    problem. Doing so should also be rather non-controversial, as there are
    a number of places in struct task_struct that have padding which we
    could use to avoid growing the structure by splitting up the fields.
    
    For now, so as to fix the kfuncs to be correct, this patch instead
    updates bpf_task_acquire() and bpf_task_release() to use the p->usage
    field for refcounting via the get_task_struct() and put_task_struct()
    functions. Because we can no longer rely on RCU, the change also guts
    the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions
    pending a resolution on the above problem.
    
    In addition, the task fixes the kfunc and rcu_read_lock selftests to
    expect this new behavior.
    
    Fixes: 90660309b0c7 ("bpf: Add kfuncs for storing struct task_struct * as a kptr")
    Fixes: fca1aa75518c ("bpf: Handle MEM_RCU type properly")
    Reported-by: Matus Jokay <matus.jokay@stuba.sk>
    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:15 +02:00
Jerome Marchand 02626b368f bpf: Handle MEM_RCU type properly
Bugzilla: https://bugzilla.redhat.com/2177177

commit fca1aa75518c03b04c3c249e9a9134faf9ca18c5
Author: Yonghong Song <yhs@fb.com>
Date:   Sat Dec 3 10:46:02 2022 -0800

    bpf: Handle MEM_RCU type properly

    Commit 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
    introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that
    commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED
    so that it can be passed into kfuncs or helpers as an argument.

    Martin raised a good question in [1] such that the rcu pointer,
    although being able to accessing the object, might have reference
    count of 0. This might cause a problem if the rcu pointer is passed
    to a kfunc which expects trusted arguments where ref count should
    be greater than 0.

    This patch makes the following changes related to MEM_RCU pointer:
      - MEM_RCU pointer might be NULL (PTR_MAYBE_NULL).
      - Introduce KF_RCU so MEM_RCU ptr can be acquired with
        a KF_RCU tagged kfunc which assumes ref count of rcu ptr
        could be zero.
      - For mem access 'b = ptr->a', say 'ptr' is a MEM_RCU ptr, and
        'a' is tagged with __rcu as well. Let us mark 'b' as
        MEM_RCU | PTR_MAYBE_NULL.

     [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/

    Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221203184602.477272-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:14 +02:00
Jerome Marchand 0ec7796171 bpf: Add kfunc bpf_rcu_read_lock/unlock()
Bugzilla: https://bugzilla.redhat.com/2177177

commit 9bb00b2895cbfe0ad410457b605d0a72524168c1
Author: Yonghong Song <yhs@fb.com>
Date:   Wed Nov 23 21:32:17 2022 -0800

    bpf: Add kfunc bpf_rcu_read_lock/unlock()

    Add two kfunc's bpf_rcu_read_lock() and bpf_rcu_read_unlock(). These two kfunc's
    can be used for all program types. The following is an example about how
    rcu pointer are used w.r.t. bpf_rcu_read_lock()/bpf_rcu_read_unlock().

      struct task_struct {
        ...
        struct task_struct              *last_wakee;
        struct task_struct __rcu        *real_parent;
        ...
      };

    Let us say prog does 'task = bpf_get_current_task_btf()' to get a
    'task' pointer. The basic rules are:
      - 'real_parent = task->real_parent' should be inside bpf_rcu_read_lock
        region. This is to simulate rcu_dereference() operation. The
        'real_parent' is marked as MEM_RCU only if (1). task->real_parent is
        inside bpf_rcu_read_lock region, and (2). task is a trusted ptr. So
        MEM_RCU marked ptr can be 'trusted' inside the bpf_rcu_read_lock region.
      - 'last_wakee = real_parent->last_wakee' should be inside bpf_rcu_read_lock
        region since it tries to access rcu protected memory.
      - the ptr 'last_wakee' will be marked as PTR_UNTRUSTED since in general
        it is not clear whether the object pointed by 'last_wakee' is valid or
        not even inside bpf_rcu_read_lock region.

    The verifier will reset all rcu pointer register states to untrusted
    at bpf_rcu_read_unlock() kfunc call site, so any such rcu pointer
    won't be trusted any more outside the bpf_rcu_read_lock() region.

    The current implementation does not support nested rcu read lock
    region in the prog.

    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221124053217.2373910-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:12 +02:00
Jerome Marchand 868564cc57 bpf: Introduce might_sleep field in bpf_func_proto
Bugzilla: https://bugzilla.redhat.com/2177177

commit 01685c5bddaa6df3d662c8afed5e5289fcc68e5a
Author: Yonghong Song <yhs@fb.com>
Date:   Wed Nov 23 21:32:11 2022 -0800

    bpf: Introduce might_sleep field in bpf_func_proto

    Introduce bpf_func_proto->might_sleep to indicate a particular helper
    might sleep. This will make later check whether a helper might be
    sleepable or not easier.

    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221124053211.2373553-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:12 +02:00
Jerome Marchand 3cd5a9ecbe bpf: Add bpf_task_from_pid() kfunc
Bugzilla: https://bugzilla.redhat.com/2177177

commit 3f0e6f2b41d35d4446160c745e8f09037447dd8f
Author: David Vernet <void@manifault.com>
Date:   Tue Nov 22 08:52:59 2022 -0600

    bpf: Add bpf_task_from_pid() kfunc

    Callers can currently store tasks as kptrs using bpf_task_acquire(),
    bpf_task_kptr_get(), and bpf_task_release(). These are useful if a
    caller already has a struct task_struct *, but there may be some callers
    who only have a pid, and want to look up the associated struct
    task_struct * from that to e.g. find task->comm.

    This patch therefore adds a new bpf_task_from_pid() kfunc which allows
    BPF programs to get a struct task_struct * kptr from a pid.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221122145300.251210-2-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:11 +02:00
Jerome Marchand 81daf48e7c bpf: Don't use idx variable when registering kfunc dtors
Bugzilla: https://bugzilla.redhat.com/2177177

commit 2fcc6081a7bf8f7f531cffdc58b630b822e700a1
Author: David Vernet <void@manifault.com>
Date:   Wed Nov 23 07:52:53 2022 -0600

    bpf: Don't use idx variable when registering kfunc dtors

    In commit fda01efc6160 ("bpf: Enable cgroups to be used as kptrs"), I
    added an 'int idx' variable to kfunc_init() which was meant to
    dynamically set the index of the btf id entries of the
    'generic_dtor_ids' array. This was done to make the code slightly less
    brittle as the struct cgroup * kptr kfuncs such as bpf_cgroup_aquire()
    are compiled out if CONFIG_CGROUPS is not defined. This, however, causes
    an lkp build warning:

    >> kernel/bpf/helpers.c:2005:40: warning: multiple unsequenced
       modifications to 'idx' [-Wunsequenced]
    	.btf_id       = generic_dtor_ids[idx++],

    Fix the warning by just hard-coding the indices.

    Fixes: fda01efc6160 ("bpf: Enable cgroups to be used as kptrs")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: David Vernet <void@manifault.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221123135253.637525-1-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:11 +02:00
Jerome Marchand 0ee084404a bpf: Add bpf_cgroup_ancestor() kfunc
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Missing commit 7f203bc89eb6 ("cgroup: Replace
cgroup->ancestor_ids[] with ->ancestors[]")

commit 5ca7867078296cfa9c100f9a3b2d24be1e139825
Author: David Vernet <void@manifault.com>
Date:   Mon Nov 21 23:54:57 2022 -0600

    bpf: Add bpf_cgroup_ancestor() kfunc

    struct cgroup * objects have a variably sized struct cgroup *ancestors[]
    field which stores pointers to their ancestor cgroups. If using a cgroup
    as a kptr, it can be useful to access these ancestors, but doing so
    requires variable offset accesses for PTR_TO_BTF_ID, which is currently
    unsupported.

    This is a very useful field to access for cgroup kptrs, as programs may
    wish to walk their ancestor cgroups when determining e.g. their
    proportional cpu.weight. So as to enable this functionality with cgroup
    kptrs before var_off is supported for PTR_TO_BTF_ID, this patch adds a
    bpf_cgroup_ancestor() kfunc which accesses the cgroup node on behalf of
    the caller, and acquires a reference on it. Once var_off is supported
    for PTR_TO_BTF_ID, and fields inside a struct can be marked as trusted
    so they retain the PTR_TRUSTED modifier when walked, this can be
    removed.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221122055458.173143-4-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:10 +02:00
Jerome Marchand cabc5abf20 bpf: Enable cgroups to be used as kptrs
Bugzilla: https://bugzilla.redhat.com/2177177

commit fda01efc61605af7c6fa03c4109f14d59c9228b7
Author: David Vernet <void@manifault.com>
Date:   Mon Nov 21 23:54:55 2022 -0600

    bpf: Enable cgroups to be used as kptrs

    Now that tasks can be used as kfuncs, and the PTR_TRUSTED flag is
    available for us to easily add basic acquire / get / release kfuncs, we
    can do the same for cgroups. This patch set adds the following kfuncs
    which enable using cgroups as kptrs:

    struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp);
    struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp);
    void bpf_cgroup_release(struct cgroup *cgrp);

    A follow-on patch will add a selftest suite which validates these
    kfuncs.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221122055458.173143-2-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:10 +02:00
Jerome Marchand 4b3197d3e3 bpf: Add a kfunc for generic type cast
Bugzilla: https://bugzilla.redhat.com/2177177

commit a35b9af4ec2c7f69286ef861fd2074a577e354cb
Author: Yonghong Song <yhs@fb.com>
Date:   Sun Nov 20 11:54:37 2022 -0800

    bpf: Add a kfunc for generic type cast

    Implement bpf_rdonly_cast() which tries to cast the object
    to a specified type. This tries to support use case like below:
      #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
    where skb_end_pointer(SKB) is a 'unsigned char *' and needs to
    be casted to 'struct skb_shared_info *'.

    The signature of bpf_rdonly_cast() looks like
       void *bpf_rdonly_cast(void *obj, __u32 btf_id)
    The function returns the same 'obj' but with PTR_TO_BTF_ID with
    btf_id. The verifier will ensure btf_id being a struct type.

    Since the supported type cast may not reflect what the 'obj'
    represents, the returned btf_id is marked as PTR_UNTRUSTED, so
    the return value and subsequent pointer chasing cannot be
    used as helper/kfunc arguments.

    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221120195437.3114585-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:09 +02:00
Jerome Marchand 750e4d2c71 bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx
Bugzilla: https://bugzilla.redhat.com/2177177

commit fd264ca020948a743e4c36731dfdecc4a812153c
Author: Yonghong Song <yhs@fb.com>
Date:   Sun Nov 20 11:54:32 2022 -0800

    bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx

    Implement bpf_cast_to_kern_ctx() kfunc which does a type cast
    of a uapi ctx object to the corresponding kernel ctx. Previously
    if users want to access some data available in kctx but not
    in uapi ctx, bpf_probe_read_kernel() helper is needed.
    The introduction of bpf_cast_to_kern_ctx() allows direct
    memory access which makes code simpler and easier to understand.

    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221120195432.3113982-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:09 +02:00
Jerome Marchand 96be11db9f bpf: Add support for kfunc set with common btf_ids
Bugzilla: https://bugzilla.redhat.com/2177177

commit cfe1456440c8feaf6558577a400745d774418379
Author: Yonghong Song <yhs@fb.com>
Date:   Sun Nov 20 11:54:26 2022 -0800

    bpf: Add support for kfunc set with common btf_ids

    Later on, we will introduce kfuncs bpf_cast_to_kern_ctx() and
    bpf_rdonly_cast() which apply to all program types. Currently kfunc set
    only supports individual prog types. This patch added support for kfunc
    applying to all program types.

    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221120195426.3113828-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:09 +02:00
Jerome Marchand a89a10052c bpf: Disallow bpf_obj_new_impl call when bpf_mem_alloc_init fails
Bugzilla: https://bugzilla.redhat.com/2177177

commit e181d3f143f7957a73c8365829249d8084602606
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Mon Nov 21 02:56:10 2022 +0530

    bpf: Disallow bpf_obj_new_impl call when bpf_mem_alloc_init fails

    In the unlikely event that bpf_global_ma is not correctly initialized,
    instead of checking the boolean everytime bpf_obj_new_impl is called,
    simply check it while loading the program and return an error if
    bpf_global_ma_set is false.

    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221120212610.2361700-1-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:09 +02:00
Jerome Marchand 3ed0a6a4dd bpf: Add kfuncs for storing struct task_struct * as a kptr
Bugzilla: https://bugzilla.redhat.com/2177177

commit 90660309b0c76c564a31a21f3a81d6641a9acaa0
Author: David Vernet <void@manifault.com>
Date:   Sat Nov 19 23:10:03 2022 -0600

    bpf: Add kfuncs for storing struct task_struct * as a kptr

    Now that BPF supports adding new kernel functions with kfuncs, and
    storing kernel objects in maps with kptrs, we can add a set of kfuncs
    which allow struct task_struct objects to be stored in maps as
    referenced kptrs. The possible use cases for doing this are plentiful.
    During tracing, for example, it would be useful to be able to collect
    some tasks that performed a certain operation, and then periodically
    summarize who they are, which cgroup they're in, how much CPU time
    they've utilized, etc.

    In order to enable this, this patch adds three new kfuncs:

    struct task_struct *bpf_task_acquire(struct task_struct *p);
    struct task_struct *bpf_task_kptr_get(struct task_struct **pp);
    void bpf_task_release(struct task_struct *p);

    A follow-on patch will add selftests validating these kfuncs.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221120051004.3605026-4-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:08 +02:00
Jerome Marchand aeccbbda92 bpf: Introduce single ownership BPF linked list API
Bugzilla: https://bugzilla.redhat.com/2177177

commit 8cab76ec634995e59a8b6346bf8b835ab7fad3a3
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 18 07:26:06 2022 +0530

    bpf: Introduce single ownership BPF linked list API

    Add a linked list API for use in BPF programs, where it expects
    protection from the bpf_spin_lock in the same allocation as the
    bpf_list_head. For now, only one bpf_spin_lock can be present hence that
    is assumed to be the one protecting the bpf_list_head.

    The following functions are added to kick things off:

    // Add node to beginning of list
    void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node);

    // Add node to end of list
    void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node);

    // Remove node at beginning of list and return it
    struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head);

    // Remove node at end of list and return it
    struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head);

    The lock protecting the bpf_list_head needs to be taken for all
    operations. The verifier ensures that the lock that needs to be taken is
    always held, and only the correct lock is taken for these operations.
    These checks are made statically by relying on the reg->id preserved for
    registers pointing into regions having both bpf_spin_lock and the
    objects protected by it. The comment over check_reg_allocation_locked in
    this change describes the logic in detail.

    Note that bpf_list_push_front and bpf_list_push_back are meant to
    consume the object containing the node in the 1st argument, however that
    specific mechanism is intended to not release the ref_obj_id directly
    until the bpf_spin_unlock is called. In this commit, nothing is done,
    but the next commit will be introducing logic to handle this case, so it
    has been left as is for now.

    bpf_list_pop_front and bpf_list_pop_back delete the first or last item
    of the list respectively, and return pointer to the element at the
    list_node offset. The user can then use container_of style macro to get
    the actual entry type. The verifier however statically knows the actual
    type, so the safety properties are still preserved.

    With these additions, programs can now manage their own linked lists and
    store their objects in them.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221118015614.2013203-17-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:07 +02:00
Jerome Marchand 0c43899670 bpf: Introduce bpf_obj_drop
Bugzilla: https://bugzilla.redhat.com/2177177

commit ac9f06050a3580cf4076a57a470cd71f12a81171
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 18 07:26:04 2022 +0530

    bpf: Introduce bpf_obj_drop

    Introduce bpf_obj_drop, which is the kfunc used to free allocated
    objects (allocated using bpf_obj_new). Pairing with bpf_obj_new, it
    implicitly destructs the fields part of object automatically without
    user intervention.

    Just like the previous patch, btf_struct_meta that is needed to free up
    the special fields is passed as a hidden argument to the kfunc.

    For the user, a convenience macro hides over the kernel side kfunc which
    is named bpf_obj_drop_impl.

    Continuing the previous example:

    void prog(void) {
    	struct foo *f;

    	f = bpf_obj_new(typeof(*f));
    	if (!f)
    		return;
    	bpf_obj_drop(f);
    }

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221118015614.2013203-15-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:07 +02:00
Jerome Marchand 27b1b8aed6 bpf: Introduce bpf_obj_new
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Context change from already backported commit 997849c4b969
("bpf: Zeroing allocated object from slab in bpf memory allocator"

commit 958cf2e273f0929c66169e0788031310e8118722
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 18 07:26:03 2022 +0530

    bpf: Introduce bpf_obj_new

    Introduce type safe memory allocator bpf_obj_new for BPF programs. The
    kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
    to kfuncs still requires having them in prototype, unlike BPF helpers
    which always take 5 arguments and have them checked using bpf_func_proto
    in verifier, ignoring unset argument types.

    Introduce __ign suffix to ignore a specific kfunc argument during type
    checks, then use this to introduce support for passing type metadata to
    the bpf_obj_new_impl kfunc.

    The user passes BTF ID of the type it wants to allocates in program BTF,
    the verifier then rewrites the first argument as the size of this type,
    after performing some sanity checks (to ensure it exists and it is a
    struct type).

    The second argument is also fixed up and passed by the verifier. This is
    the btf_struct_meta for the type being allocated. It would be needed
    mostly for the offset array which is required for zero initializing
    special fields while leaving the rest of storage in unitialized state.

    It would also be needed in the next patch to perform proper destruction
    of the object's special fields.

    Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
    using the any context BPF memory allocator introduced recently. To this
    end, a global instance of the BPF memory allocator is initialized on
    boot to be used for this purpose. This 'bpf_global_ma' serves all
    allocations for bpf_obj_new. In the future, bpf_obj_new variants will
    allow specifying a custom allocator.

    Note that now that bpf_obj_new can be used to allocate objects that can
    be linked to BPF linked list (when future linked list helpers are
    available), we need to also free the elements using bpf_mem_free.
    However, since the draining of elements is done outside the
    bpf_spin_lock, we need to do migrate_disable around the call since
    bpf_list_head_free can be called from map free path where migration is
    enabled. Otherwise, when called from BPF programs migration is already
    disabled.

    A convenience macro is included in the bpf_experimental.h header to hide
    over the ugly details of the implementation, leading to user code
    looking similar to a language level extension which allocates and
    constructs fields of a user type.

    struct bar {
            struct bpf_list_node node;
    };

    struct foo {
            struct bpf_spin_lock lock;
            struct bpf_list_head head __contains(bar, node);
    };

    void prog(void) {
            struct foo *f;

            f = bpf_obj_new(typeof(*f));
            if (!f)
                    return;
            ...
    }

    A key piece of this story is still missing, i.e. the free function,
    which will come in the next patch.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221118015614.2013203-14-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:07 +02:00
Jerome Marchand 9eb2139c9f bpf: Allow locking bpf_spin_lock in allocated objects
Bugzilla: https://bugzilla.redhat.com/2177177

commit 4e814da0d59917c6d758a80e63e79b5ee212cf11
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 18 07:25:58 2022 +0530

    bpf: Allow locking bpf_spin_lock in allocated objects

    Allow locking a bpf_spin_lock in an allocated object, in addition to
    already supported map value pointers. The handling is similar to that of
    map values, by just preserving the reg->id of PTR_TO_BTF_ID | MEM_ALLOC
    as well, and adjusting process_spin_lock to work with them and remember
    the id in verifier state.

    Refactor the existing process_spin_lock to work with PTR_TO_BTF_ID |
    MEM_ALLOC in addition to PTR_TO_MAP_VALUE. We need to update the
    reg_may_point_to_spin_lock which is used in mark_ptr_or_null_reg to
    preserve reg->id, that will be used in env->cur_state->active_spin_lock
    to remember the currently held spin lock.

    Also update the comment describing bpf_spin_lock implementation details
    to also talk about PTR_TO_BTF_ID | MEM_ALLOC type.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221118015614.2013203-9-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:06 +02:00
Jerome Marchand d03c51f6bc bpf: Support bpf_list_head in map values
Bugzilla: https://bugzilla.redhat.com/2177177

commit f0c5941ff5b255413d31425bb327c2aec3625673
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Tue Nov 15 00:45:25 2022 +0530

    bpf: Support bpf_list_head in map values

    Add the support on the map side to parse, recognize, verify, and build
    metadata table for a new special field of the type struct bpf_list_head.
    To parameterize the bpf_list_head for a certain value type and the
    list_node member it will accept in that value type, we use BTF
    declaration tags.

    The definition of bpf_list_head in a map value will be done as follows:

    struct foo {
    	struct bpf_list_node node;
    	int data;
    };

    struct map_value {
    	struct bpf_list_head head __contains(foo, node);
    };

    Then, the bpf_list_head only allows adding to the list 'head' using the
    bpf_list_node 'node' for the type struct foo.

    The 'contains' annotation is a BTF declaration tag composed of four
    parts, "contains:name:node" where the name is then used to look up the
    type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
    the lookup. The node defines name of the member in this type that has
    the type struct bpf_list_node, which is actually used for linking into
    the linked list. For now, 'kind' part is hardcoded as struct.

    This allows building intrusive linked lists in BPF, using container_of
    to obtain pointer to entry, while being completely type safe from the
    perspective of the verifier. The verifier knows exactly the type of the
    nodes, and knows that list helpers return that type at some fixed offset
    where the bpf_list_node member used for this list exists. The verifier
    also uses this information to disallow adding types that are not
    accepted by a certain list.

    For now, no elements can be added to such lists. Support for that is
    coming in future patches, hence draining and freeing items is done with
    a TODO that will be resolved in a future patch.

    Note that the bpf_list_head_free function moves the list out to a local
    variable under the lock and releases it, doing the actual draining of
    the list items outside the lock. While this helps with not holding the
    lock for too long pessimizing other concurrent list operations, it is
    also necessary for deadlock prevention: unless every function called in
    the critical section would be notrace, a fentry/fexit program could
    attach and call bpf_map_update_elem again on the map, leading to the
    same lock being acquired if the key matches and lead to a deadlock.
    While this requires some special effort on part of the BPF programmer to
    trigger and is highly unlikely to occur in practice, it is always better
    if we can avoid such a condition.

    While notrace would prevent this, doing the draining outside the lock
    has advantages of its own, hence it is used to also fix the deadlock
    related problem.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:04 +02:00
Jerome Marchand 2b8a340165 bpf: Consolidate spin_lock, timer management into btf_record
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Context change from already backported commit 997849c4b969
("bpf: Zeroing allocated object from slab in bpf memory allocator")

commit db559117828d2448fe81ada051c60bcf39f822e9
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 4 00:39:56 2022 +0530

    bpf: Consolidate spin_lock, timer management into btf_record

    Now that kptr_off_tab has been refactored into btf_record, and can hold
    more than one specific field type, accomodate bpf_spin_lock and
    bpf_timer as well.

    While they don't require any more metadata than offset, having all
    special fields in one place allows us to share the same code for
    allocated user defined types and handle both map values and these
    allocated objects in a similar fashion.

    As an optimization, we still keep spin_lock_off and timer_off offsets in
    the btf_record structure, just to avoid having to find the btf_field
    struct each time their offset is needed. This is mostly needed to
    manipulate such objects in a map value at runtime. It's ok to hardcode
    just one offset as more than one field is disallowed.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:01 +02:00
Jerome Marchand dcf538d57d bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Context change from missing commit 7f203bc89eb6 ("cgroup:
Replace cgroup->ancestor_ids[] with ->ancestors[]")

commit c4bcfb38a95edb1021a53f2d0356a78120ecfbe4
Author: Yonghong Song <yhs@fb.com>
Date:   Tue Oct 25 21:28:50 2022 -0700

    bpf: Implement cgroup storage available to non-cgroup-attached bpf progs

    Similar to sk/inode/task storage, implement similar cgroup local storage.

    There already exists a local storage implementation for cgroup-attached
    bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
    bpf_get_local_storage(). But there are use cases such that non-cgroup
    attached bpf progs wants to access cgroup local storage data. For example,
    tc egress prog has access to sk and cgroup. It is possible to use
    sk local storage to emulate cgroup local storage by storing data in socket.
    But this is a waste as it could be lots of sockets belonging to a particular
    cgroup. Alternatively, a separate map can be created with cgroup id as the key.
    But this will introduce additional overhead to manipulate the new map.
    A cgroup local storage, similar to existing sk/inode/task storage,
    should help for this use case.

    The life-cycle of storage is managed with the life-cycle of the
    cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
    with a call to bpf_cgrp_storage_free() when cgroup itself
    is deleted.

    The userspace map operations can be done by using a cgroup fd as a key
    passed to the lookup, update and delete operations.

    Typically, the following code is used to get the current cgroup:
        struct task_struct *task = bpf_get_current_task_btf();
        ... task->cgroups->dfl_cgrp ...
    and in structure task_struct definition:
        struct task_struct {
            ....
            struct css_set __rcu            *cgroups;
            ....
        }
    With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
    So the current implementation only supports non-sleepable program and supporting
    sleepable program will be the next step together with adding rcu_read_lock
    protection for rcu tagged structures.

    Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
    storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
    for cgroup storage available to non-cgroup-attached bpf programs. The old
    cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
    The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
    functionality. While old cgroup storage pre-allocates storage memory, the new
    mechanism can also pre-allocate with a user space bpf_map_update_elem() call
    to avoid potential run-time memory allocation failure.
    Therefore, the new cgroup storage can provide all functionality w.r.t.
    the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
    BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
    be deprecated since the new one can provide the same functionality.

    Acked-by: David Vernet <void@manifault.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:58 +02:00
Artem Savkov d4045a2578 bpf: Add bpf_user_ringbuf_drain() helper
Bugzilla: https://bugzilla.redhat.com/2166911

Conflicts: fixing previously incorrect order of cases in verifier.c

commit 20571567384428dfc9fe5cf9f2e942e1df13c2dd
Author: David Vernet <void@manifault.com>
Date:   Mon Sep 19 19:00:58 2022 -0500

    bpf: Add bpf_user_ringbuf_drain() helper

    In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
    will allow user-space applications to publish messages to a ring buffer
    that is consumed by a BPF program in kernel-space. In order for this
    map-type to be useful, it will require a BPF helper function that BPF
    programs can invoke to drain samples from the ring buffer, and invoke
    callbacks on those samples. This change adds that capability via a new BPF
    helper function:

    bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                           u64 flags)

    BPF programs may invoke this function to run callback_fn() on a series of
    samples in the ring buffer. callback_fn() has the following signature:

    long callback_fn(struct bpf_dynptr *dynptr, void *context);

    Samples are provided to the callback in the form of struct bpf_dynptr *'s,
    which the program can read using BPF helper functions for querying
    struct bpf_dynptr's.

    In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
    type is added to the verifier to reflect a dynptr that was allocated by
    a helper function and passed to a BPF program. Unlike PTR_TO_STACK
    dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
    dynptrs need not use reference tracking, as the BPF helper is trusted to
    properly free the dynptr before returning. The verifier currently only
    supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.

    Note that while the corresponding user-space libbpf logic will be added
    in a subsequent patch, this patch does contain an implementation of the
    .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
    .map_poll() callback guarantees that an epoll-waiting user-space
    producer will receive at least one event notification whenever at least
    one sample is drained in an invocation of bpf_user_ringbuf_drain(),
    provided that the function is not invoked with the BPF_RB_NO_WAKEUP
    flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
    notification is sent even if no sample was drained.

    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:27 +01:00
Artem Savkov 76989136e7 bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF
Bugzilla: https://bugzilla.redhat.com/2166911

Conflicts: already applied 7d21225e01 "bpf: Gate dynptr API behind
CAP_BPF"

commit 5679ff2f138f77b281c468959dc5022cc524d400
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Tue Aug 23 03:31:17 2022 +0200

    bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF

    They would require func_info which needs prog BTF anyway. Loading BTF
    and setting the prog btf_fd while loading the prog indirectly requires
    CAP_BPF, so just to reduce confusion, move both these helpers taking
    callback under bpf_capable() protection as well, since they cannot be
    used without CAP_BPF.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20220823013117.24916-1-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:25 +01:00
Artem Savkov ca86169c5a bpf: expose bpf_strtol and bpf_strtoul to all program types
Bugzilla: https://bugzilla.redhat.com/2166911

Conflicts: already applied 7d21225e01 "bpf: Gate dynptr API behind CAP_BPF"

commit 8a67f2de9b1dc3cf8b75b4bf589efb1f08e3e9b8
Author: Stanislav Fomichev <sdf@google.com>
Date:   Tue Aug 23 15:25:53 2022 -0700

    bpf: expose bpf_strtol and bpf_strtoul to all program types

    bpf_strncmp is already exposed everywhere. The motivation is to keep
    those helpers in kernel/bpf/helpers.c. Otherwise it's tempting to move
    them under kernel/bpf/cgroup.c because they are currently only used
    by sysctl prog types.

    Suggested-by: Martin KaFai Lau <kafai@fb.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20220823222555.523590-4-sdf@google.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:25 +01:00
Artem Savkov 5d1eabf3dc bpf: Export bpf_dynptr_get_size()
Bugzilla: https://bugzilla.redhat.com/2166911

commit 51df4865718540f51bb5d3e552c50dc88e1333d6
Author: Roberto Sassu <roberto.sassu@huawei.com>
Date:   Tue Sep 20 09:59:43 2022 +0200

    bpf: Export bpf_dynptr_get_size()
    
    Export bpf_dynptr_get_size(), so that kernel code dealing with eBPF dynamic
    pointers can obtain the real size of data carried by this data structure.
    
    Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
    Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
    Acked-by: KP Singh <kpsingh@kernel.org>
    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20220920075951.929132-6-roberto.sassu@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:15 +01:00
Artem Savkov 47f8ca3b6d bpf: Introduce cgroup_{common,current}_func_proto
Bugzilla: https://bugzilla.redhat.com/2166911

commit dea6a4e17013382b20717664ebf3d7cc405e0952
Author: Stanislav Fomichev <sdf@google.com>
Date:   Tue Aug 23 15:25:51 2022 -0700

    bpf: Introduce cgroup_{common,current}_func_proto
    
    Split cgroup_base_func_proto into the following:
    
    * cgroup_common_func_proto - common helpers for all cgroup hooks
    * cgroup_current_func_proto - common helpers for all cgroup hooks
      running in the process context (== have meaningful 'current').
    
    Move bpf_{g,s}et_retval and other cgroup-related helpers into
    kernel/bpf/cgroup.c so they closer to where they are being used.
    
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:03 +01:00
Artem Savkov 0e33f79a84 bpf: export crash_kexec() as destructive kfunc
Bugzilla: https://bugzilla.redhat.com/2166911

commit 133790596406ce2658f0864eb7eac64987c2b12f
Author: Artem Savkov <asavkov@redhat.com>
Date:   Wed Aug 10 08:59:04 2022 +0200

    bpf: export crash_kexec() as destructive kfunc
    
    Allow properly marked bpf programs to call crash_kexec().
    
    Signed-off-by: Artem Savkov <asavkov@redhat.com>
    Link: https://lore.kernel.org/r/20220810065905.475418-3-asavkov@redhat.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:00 +01:00
Artem Savkov b341ead9ab bpf: Add BPF-helper for accessing CLOCK_TAI
Bugzilla: https://bugzilla.redhat.com/2166911

commit c8996c98f703b09afe77a1d247dae691c9849dc1
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Tue Aug 9 08:08:02 2022 +0200

    bpf: Add BPF-helper for accessing CLOCK_TAI
    
    Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai")
    introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time
    sensitive networks (TSN), where all nodes are synchronized by Precision Time
    Protocol (PTP), it's helpful to have the possibility to generate timestamps
    based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in
    place, it becomes very convenient to correlate activity across different
    machines in the network.
    
    Use cases for such a BPF helper include functionalities such as Tx launch
    time (e.g. ETF and TAPRIO Qdiscs) and timestamping.
    
    Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is.
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    [Kurt: Wrote changelog and renamed helper]
    Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
    Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.de
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:53:59 +01:00
Yauheni Kaliuta 10c1a87294 bpf: Add verifier check for BPF_PTR_POISON retval and arg
Bugzilla: http://bugzilla.redhat.com/2120968

commit 47e34cb74d376ddfeaef94abb1d6dfb3c905ee51
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Mon Sep 12 08:45:44 2022 -0700

    bpf: Add verifier check for BPF_PTR_POISON retval and arg
    
    BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing
    referenced kptr in map") to denote a bpf_func_proto btf_id which the
    verifier will replace with a dynamically-determined btf_id at verification
    time.
    
    This patch adds verifier 'poison' functionality to BPF_PTR_POISON in
    order to prepare for expanded use of the value to poison ret- and
    arg-btf_id in ongoing work, namely rbtree and linked list patchsets
    [0, 1]. Specifically, when the verifier checks helper calls, it assumes
    that BPF_PTR_POISON'ed ret type will be replaced with a valid type before
    - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id.
    
    If poisoned btf_id reaches default handling block for either, consider
    this a verifier internal error and fail verification. Otherwise a helper
    w/ poisoned btf_id but no verifier logic replacing the type will cause a
    crash as the invalid pointer is dereferenced.
    
    Also move BPF_PTR_POISON to existing include/linux/posion.h header and
    remove unnecessary shift.
    
      [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
      [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:09 +02:00
Yauheni Kaliuta 20067c525d bpf: Fix non-static bpf_func_proto struct definitions
Bugzilla: http://bugzilla.redhat.com/2120968

commit dc368e1c658e4f478a45e8d1d5b0c8392ca87506
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Jun 16 15:54:07 2022 -0700

    bpf: Fix non-static bpf_func_proto struct definitions

    This patch does two things:

    1) Marks the dynptr bpf_func_proto structs that were added in [1]
       as static, as pointed out by the kernel test robot in [2].

    2) There are some bpf_func_proto structs marked as extern which can
       instead be statically defined.

      [1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/
      [2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:09 +02:00
Yauheni Kaliuta 9a5f220992 btf: Export bpf_dynptr definition
Bugzilla: https://bugzilla.redhat.com/2120968

commit 00f146413ccb6c84308e559281449755c83f54c5
Author: Roberto Sassu <roberto.sassu@huawei.com>
Date:   Tue Sep 20 09:59:40 2022 +0200

    btf: Export bpf_dynptr definition
    
    eBPF dynamic pointers is a new feature recently added to upstream. It binds
    together a pointer to a memory area and its size. The internal kernel
    structure bpf_dynptr_kern is not accessible by eBPF programs in user space.
    They instead see bpf_dynptr, which is then translated to the internal
    kernel structure by the eBPF verifier.
    
    The problem is that it is not possible to include at the same time the uapi
    include linux/bpf.h and the vmlinux BTF vmlinux.h, as they both contain the
    definition of some structures/enums. The compiler complains saying that the
    structures/enums are redefined.
    
    As bpf_dynptr is defined in the uapi include linux/bpf.h, this makes it
    impossible to include vmlinux.h. However, in some cases, e.g. when using
    kfuncs, vmlinux.h has to be included. The only option until now was to
    include vmlinux.h and add the definition of bpf_dynptr directly in the eBPF
    program source code from linux/bpf.h.
    
    Solve the problem by using the same approach as for bpf_timer (which also
    follows the same scheme with the _kern suffix for the internal kernel
    structure).
    
    Add the following line in one of the dynamic pointer helpers,
    bpf_dynptr_from_mem():
    
    BTF_TYPE_EMIT(struct bpf_dynptr);
    
    Cc: stable@vger.kernel.org
    Cc: Joanne Koong <joannelkoong@gmail.com>
    Fixes: 97e03f521050c ("bpf: Add verifier support for dynptrs")
    Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Tested-by: KP Singh <kpsingh@kernel.org>
    Link: https://lore.kernel.org/r/20220920075951.929132-3-roberto.sassu@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:08 +02:00