Commit Graph

23 Commits

Author SHA1 Message Date
Artem Savkov 27143d897f bpf: switch to fdget_raw()
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: already present d7ba4cc900bf1 "bpf: return
long from bpf_map_ops funcs"

commit 1d0027dc9a3c2be501de625feca2eaac7f67b509
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Aug 3 20:13:33 2022 -0400

    bpf: switch to fdget_raw()

    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:32 +02:00
Artem Savkov e7658395b4 bpf: Teach verifier that certain helpers accept NULL pointer.
Bugzilla: https://bugzilla.redhat.com/2221599

commit 91571a515d1bcdc280bb46423bb697ea7eb42ff3
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Mon Apr 3 21:50:25 2023 -0700

    bpf: Teach verifier that certain helpers accept NULL pointer.
    
    bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers
    perform run-time check that sk|inode|task|cgrp pointer != NULL.
    Teach verifier about this fact and allow bpf programs to pass
    PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers.
    It will be used in the subsequent patch that will do
    bpf_sk_storage_get(.., skb->sk, ...);
    Even when 'skb' pointer is trusted the 'sk' pointer may be NULL.
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:26 +02:00
Artem Savkov 7b478e7484 bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
Bugzilla: https://bugzilla.redhat.com/2221599

commit 08a7ce384e33e53e0732c500a8af67a73f8fceca
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Wed Mar 22 14:52:43 2023 -0700

    bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
    
    This patch uses bpf_mem_alloc for the task and cgroup local storage that
    the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID.
    eg. bpf_get_current_task_btf() can be used in some of the kmalloc code
    path which will cause deadlock/recursion. bpf_mem_cache_alloc is
    deadlock free and will solve a legit use case in [1].
    
    For sk storage, its batch creation benchmark shows a few percent
    regression when the sk create/destroy batch size is larger than 32.
    The sk creation/destruction happens much more often and
    depends on external traffic. Considering it is hypothetical
    to be able to cause deadlock with sk storage, it can cross
    the bridge to use bpf_mem_alloc till a legit (ie. useful)
    use case comes up.
    
    For inode storage, bpf_local_storage_destroy() is called before
    waiting for a rcu gp and its memory cannot be reused immediately.
    inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.
    
    A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc().
    Only task and cgroup storage have 'bpf_ma == true' which
    means to use bpf_mem_cache_alloc/free(). This patch only changes
    selem to use bpf_mem_alloc for task and cgroup. The next patch
    will change the local_storage to use bpf_mem_alloc also for
    task and cgroup.
    
    Here is some more details on the changes:
    
    * memory allocation:
    After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because
    bpf_mem_cache_alloc() could return a reused selem. It is to keep
    the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data
    is zero-ed. SDATA(selem)->data is the visible part to the bpf prog.
    No need to use zero_map_value() to do the zeroing because
    bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using
    the selem before returning the selem through bpf_mem_cache_free().
    For the internal fields of selem, they will be initialized when
    linking to the new smap and the new local_storage.
    
    When 'bpf_ma == false', nothing changes in this patch. It will
    stay with the bpf_map_kzalloc().
    
    * memory free:
    The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle
    the bpf_ma == true case.
    
    For the common selem free path where its owner is also being destroyed,
    the mem is freed in bpf_local_storage_destroy(), the owner (task
    and cgroup) has gone through a rcu gp. The memory can be reused
    immediately, so bpf_local_storage_destroy() will call
    bpf_selem_free(..., reuse_now = true) which will do
    bpf_mem_cache_free() for immediate reuse consideration.
    
    An exception is the delete elem code path. The delete elem code path
    is called from the helper bpf_*_storage_delete() and the syscall
    bpf_map_delete_elem(). This path is an unusual case for local
    storage because the common use case is to have the local storage
    staying with its owner life time so that the bpf prog and the user
    space does not have to monitor the owner's destruction. For the delete
    elem path, the selem cannot be reused immediately because there could
    be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false)
    and it will wait for a rcu tasks trace gp before freeing the elem. The
    rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().
    
    When 'bpf_ma == false', it should be the same as before.
    __bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu().
    A few words on the 'reuse_now == true'. When 'reuse_now == true',
    it is still racing with bpf_local_storage_map_free which is under rcu
    protection, so it still needs to wait for a rcu gp instead of kfree().
    Otherwise, the selem may be reused by slab for a totally different struct
    while the bpf_local_storage_map_free() is still using it (as a
    rcu reader). For the inode case, there may be other rcu readers also.
    In short, when bpf_ma == false and reuse_now == true => vanilla rcu.
    
    [1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/
    
    Cc: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20230322215246.1675516-3-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:23 +02:00
Artem Savkov 128dd7c7f8 bpf: return long from bpf_map_ops funcs
Bugzilla: https://bugzilla.redhat.com/2221599

commit d7ba4cc900bf1eea2d8c807c6b1fc6bd61f41237
Author: JP Kobryn <inwardvessel@gmail.com>
Date:   Wed Mar 22 12:47:54 2023 -0700

    bpf: return long from bpf_map_ops funcs
    
    This patch changes the return types of bpf_map_ops functions to long, where
    previously int was returned. Using long allows for bpf programs to maintain
    the sign bit in the absence of sign extension during situations where
    inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
    error is returned.
    
    The definitions of the helper funcs are generated from comments in the bpf
    uapi header at `include/uapi/linux/bpf.h`. The return type of these
    helpers was previously changed from int to long in commit bdb7b79b4c. For
    any case where one of the map helpers call the bpf_map_ops funcs that are
    still returning 32-bit int, a compiler might not include sign extension
    instructions to properly convert the 32-bit negative value a 64-bit
    negative value.
    
    For example:
    bpf assembly excerpt of an inlined helper calling a kernel function and
    checking for a specific error:
    
    ; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
      ...
      46:	call   0xffffffffe103291c	; htab_map_update_elem
    ; if (err && err != -EEXIST) {
      4b:	cmp    $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
    
    kernel function assembly excerpt of return value from
    `htab_map_update_elem` returning 32-bit int:
    
    movl $0xffffffef, %r9d
    ...
    movl %r9d, %eax
    
    ...results in the comparison:
    cmp $0xffffffffffffffef, $0x00000000ffffffef
    
    Fixes: bdb7b79b4c ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
    Tested-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
    Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 6358cb43d4 bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
Bugzilla: https://bugzilla.redhat.com/2221599

commit a47eabf216f77cb6f22ceb38d46f1bb95968579c
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Tue Mar 7 22:59:25 2023 -0800

    bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
    
    This patch re-purpose the use_trace_rcu to mean
    if the freed memory can be reused immediately or not.
    The use_trace_rcu is renamed to reuse_now. Other than
    the boolean test is reversed, it should be a no-op.
    
    The following explains the reason for the rename and how it will
    be used in a later patch.
    
    In a later patch, bpf_mem_cache_alloc/free will be used
    in the bpf_local_storage. The bpf mem allocator will reuse
    the freed memory immediately. Some of the free paths in
    bpf_local_storage does not support memory to be reused immediately.
    These paths are the "delete" elem cases from the bpf_*_storage_delete()
    helper and the map_delete_elem() syscall. Note that "delete" elem
    before the owner's (sk/task/cgrp/inode) lifetime ended is not
    the common usage for the local storage.
    
    The common free path, bpf_local_storage_destroy(), can reuse the
    memory immediately. This common path means the storage stays with
    its owner until the owner is destroyed.
    
    The above mentioned "delete" elem paths that cannot
    reuse immediately always has the 'use_trace_rcu ==  true'.
    The cases that is safe for immediate reuse always have
    'use_trace_rcu == false'. Instead of adding another arg
    in a later patch, this patch re-purpose this arg
    to reuse_now and have the test logic reversed.
    
    In a later patch, 'reuse_now == true' will free to the
    bpf_mem_cache_free() where the memory can be reused
    immediately. 'reuse_now == false' will go through the
    call_rcu_tasks_trace().
    
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:15 +02:00
Artem Savkov b0b486d336 bpf: Refactor codes into bpf_local_storage_destroy
Bugzilla: https://bugzilla.redhat.com/2221599

commit 2ffcb6fc50174d1efc8f98633eb2647d84483c68
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Tue Mar 7 22:59:21 2023 -0800

    bpf: Refactor codes into bpf_local_storage_destroy
    
    This patch first renames bpf_local_storage_unlink_nolock to
    bpf_local_storage_destroy(). It better reflects that it is only
    used when the storage's owner (sk/task/cgrp/inode) is being kfree().
    
    All bpf_local_storage_destroy's caller is taking the spin lock and
    then free the storage. This patch also moves these two steps into
    the bpf_local_storage_destroy.
    
    This is a preparation work for a later patch that uses
    bpf_mem_cache_alloc/free in the bpf_local_storage.
    
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:15 +02:00
Artem Savkov a62a4634aa bpf, net: bpf_local_storage memory usage
Bugzilla: https://bugzilla.redhat.com/2221599

commit 7490b7f1c02ef825ef98f7230662049d4a464a21
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Mar 5 12:46:11 2023 +0000

    bpf, net: bpf_local_storage memory usage
    
    A new helper is introduced into bpf_local_storage map to calculate the
    memory usage. This helper is also used by other maps like
    bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
    
    Note that currently the dynamically allocated storage elements are not
    counted in the usage, since it will take extra runtime overhead in the
    elements update or delete path. So let's put it aside now, and implement
    it in the future when someone really need it.
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:13 +02:00
Jerome Marchand b11530f3bc bpf: Fix a compilation failure with clang lto build
Bugzilla: https://bugzilla.redhat.com/2177177

commit 3144bfa5078e0df7507a4de72061501e6a0e56be
Author: Yonghong Song <yhs@fb.com>
Date:   Tue Nov 29 21:21:47 2022 -0800

    bpf: Fix a compilation failure with clang lto build

    When building the kernel with clang lto (CONFIG_LTO_CLANG_FULL=y), the
    following compilation error will appear:

      $ make LLVM=1 LLVM_IAS=1 -j
      ...
      ld.lld: error: ld-temp.o <inline asm>:26889:1: symbol 'cgroup_storage_map_btf_ids' is already defined
      cgroup_storage_map_btf_ids:;
      ^
      make[1]: *** [/.../bpf-next/scripts/Makefile.vmlinux_o:61: vmlinux.o] Error 1

    In local_storage.c, we have
      BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct, bpf_local_storage_map)
    Commit c4bcfb38a95e ("bpf: Implement cgroup storage available to
    non-cgroup-attached bpf progs") added the above identical BTF_ID_LIST_SINGLE
    definition in bpf_cgrp_storage.c. With duplicated definitions, llvm linker
    complains with lto build.

    Also, extracting btf_id of 'struct bpf_local_storage_map' is defined four times
    for sk, inode, task and cgrp local storages. Let us define a single global one
    with a different name than cgroup_storage_map_btf_ids, which also fixed
    the lto compilation error.

    Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20221130052147.1591625-1-yhs@fb.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:14 +02:00
Jerome Marchand 344e0ea3a2 bpf: Refactor some inode/task/sk storage functions for reuse
Bugzilla: https://bugzilla.redhat.com/2177177

commit c83597fa5dc6b322e9bdf929e5f4136a3f4aa4db
Author: Yonghong Song <yhs@fb.com>
Date:   Tue Oct 25 21:28:45 2022 -0700

    bpf: Refactor some inode/task/sk storage functions for reuse

    Refactor codes so that inode/task/sk storage implementation
    can maximally share the same code. I also added some comments
    in new function bpf_local_storage_unlink_nolock() to make
    codes easy to understand. There is no functionality change.

    Acked-by: David Vernet <void@manifault.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:58 +02:00
Yauheni Kaliuta 11fec2f10e bpf: Compute map_btf_id during build time
Bugzilla: https://bugzilla.redhat.com/2120968

commit c317ab71facc2cd0a94145973318a4c914e11acc
Author: Menglong Dong <imagedong@tencent.com>
Date:   Mon Apr 25 21:32:47 2022 +0800

    bpf: Compute map_btf_id during build time
    
    For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
    types are computed during vmlinux-btf init:
    
      btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()
    
    It will lookup the btf_type according to the 'map_btf_name' field in
    'struct bpf_map_ops'. This process can be done during build time,
    thanks to Jiri's resolve_btfids.
    
    selftest of map_ptr has passed:
    
      $96 map_ptr:OK
      Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
    
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:00 +02:00
Jerome Marchand 0e62381974 bpf: Enable non-atomic allocations in local storage
Bugzilla: https://bugzilla.redhat.com/2120966

commit b00fa38a9c1cba044a32a601b49a55a18ed719d1
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Mar 17 21:55:52 2022 -0700

    bpf: Enable non-atomic allocations in local storage

    Currently, local storage memory can only be allocated atomically
    (GFP_ATOMIC). This restriction is too strict for sleepable bpf
    programs.

    In this patch, the verifier detects whether the program is sleepable,
    and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
    5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
    down to the local storage functions that allocate memory.

    Please note that bpf_task/sk/inode_storage_update_elem functions are
    invoked by userspace applications through syscalls. Preemption is
    disabled before bpf_task/sk/inode_storage_update_elem is called, which
    means they will always have to allocate memory atomically.

    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: KP Singh <kpsingh@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:06 +02:00
Artem Savkov 9c42002344 bpf: Fix usage of trace RCU in local storage.
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit dcf456c9a095a6e71f53d6f6f004133ee851ee70
Author: KP Singh <kpsingh@kernel.org>
Date:   Mon Apr 18 15:51:58 2022 +0000

    bpf: Fix usage of trace RCU in local storage.

    bpf_{sk,task,inode}_storage_free() do not need to use
    call_rcu_tasks_trace as no BPF program should be accessing the owner
    as it's being destroyed. The only other reader at this point is
    bpf_local_storage_map_free() which uses normal RCU.

    The only path that needs trace RCU are:

    * bpf_local_storage_{delete,update} helpers
    * map_{delete,update}_elem() syscalls

    Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs")
    Signed-off-by: KP Singh <kpsingh@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:56 +02:00
Artem Savkov a3732d50aa bpf: Allow bpf_local_storage to be used by sleepable programs
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 0fe4b381a59ebc53522fce579b281a67a9e1bee6
Author: KP Singh <kpsingh@kernel.org>
Date:   Fri Dec 24 15:29:15 2021 +0000

    bpf: Allow bpf_local_storage to be used by sleepable programs

    Other maps like hashmaps are already available to sleepable programs.
    Sleepable BPF programs run under trace RCU. Allow task, sk and inode
    storage to be used from sleepable programs. This allows sleepable and
    non-sleepable programs to provide shareable annotations on kernel
    objects.

    Sleepable programs run in trace RCU where as non-sleepable programs run
    in a normal RCU critical section i.e.  __bpf_prog_enter{_sleepable}
    and __bpf_prog_exit{_sleepable}) (rcu_read_lock or rcu_read_lock_trace).

    In order to make the local storage maps accessible to both sleepable
    and non-sleepable programs, one needs to call both
    call_rcu_tasks_trace and call_rcu to wait for both trace and classical
    RCU grace periods to expire before freeing memory.

    Paul's work on call_rcu_tasks_trace allows us to have per CPU queueing
    for call_rcu_tasks_trace. This behaviour can be achieved by setting
    rcupdate.rcu_task_enqueue_lim=<num_cpus> boot parameter.

    In light of these new performance changes and to keep the local storage
    code simple, avoid adding a new flag for sleepable maps / local storage
    to select the RCU synchronization (trace / classical).

    Also, update the dereferencing of the pointers to use
    rcu_derference_check (with either the trace or normal RCU locks held)
    with a common bpf_rcu_lock_held helper method.

    Signed-off-by: KP Singh <kpsingh@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211224152916.1550677-2-kpsingh@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:51 +02:00
Zhen Lei 8fb33b6055 bpf: Fix spelling mistakes
Fix some spelling mistakes in comments:
aother ==> another
Netiher ==> Neither
desribe ==> describe
intializing ==> initializing
funciton ==> function
wont ==> won't and move the word 'the' at the end to the next line
accross ==> across
pathes ==> paths
triggerred ==> triggered
excute ==> execute
ether ==> either
conervative ==> conservative
convetion ==> convention
markes ==> marks
interpeter ==> interpreter

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com
2021-05-24 21:13:05 -07:00
David S. Miller efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Tal Lossos 769c18b254 bpf: Change inode_storage's lookup_elem return value from NULL to -EBADF
bpf_fd_inode_storage_lookup_elem() returned NULL when getting a bad FD,
which caused -ENOENT in bpf_map_copy_value. -EBADF error is better than
-ENOENT for a bad FD behaviour.

The patch was partially contributed by CyberArk Software, Inc.

Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Signed-off-by: Tal Lossos <tallossos@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210307120948.61414-1-tallossos@gmail.com
2021-03-08 16:08:06 +01:00
Song Liu bc235cdb42 bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]
BPF helpers bpf_task_storage_[get|delete] could hold two locks:
bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
these helpers from fentry/fexit programs on functions in bpf_*_storage.c
may cause deadlock on either locks.

Prevent such deadlock with a per cpu counter, bpf_task_storage_busy. We
need this counter to be global, because the two locks here belong to two
different objects: bpf_local_storage_map and bpf_local_storage. If we
pick one of them as the owner of the counter, it is still possible to
trigger deadlock on the other lock. For example, if bpf_local_storage_map
owns the counters, it cannot prevent deadlock on bpf_local_storage->lock
when two maps are used.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210225234319.336131-3-songliubraving@fb.com
2021-02-26 11:51:48 -08:00
Pan Bian b9557caaf8 bpf, inode_storage: Put file handler if no storage was found
Put file f if inode_storage_ptr() returns NULL.

Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210121020856.25507-1-bianpan2016@163.com
2021-01-22 23:19:24 +01:00
KP Singh 84d571d46c bpf: Fix typo in bpf_inode_storage.c
Fix "gurranteed" -> "guaranteed" in bpf_inode_storage.c

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-4-kpsingh@kernel.org
2021-01-12 16:07:57 +01:00
KP Singh 1a9c72ad4c bpf: Local storage helpers should check nullness of owner ptr passed
The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so
helper implementations need to check this before dereferencing them.
This was already fixed for the socket storage helpers but not for task
and inode.

The issue can be reproduced by attaching an LSM program to
inode_rename hook (called when moving files) which tries to get the
inode of the new file without checking for its nullness and then trying
to move an existing file to a new path:

  mv existing_file new_file_does_not_exist

The report including the sample program and the steps for reproducing
the bug:

  https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com

Fixes: 4cf1bc1f10 ("bpf: Implement task local storage")
Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Reported-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-3-kpsingh@kernel.org
2021-01-12 16:07:56 +01:00
Lorenz Bauer 9436ef6e86 bpf: Allow specifying a BTF ID per argument in function protos
Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal
which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of
IDs, one for each argument. This array is only accessed up to the highest
numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than
five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id
is a function pointer that is called by the verifier if present. It gets the
actual BTF ID of the register, and the argument number we're currently checking.
It turns out that the only user check_arg_btf_id ignores the argument, and is
simply used to check whether the BTF ID has a struct sock_common at it's start.

Replace both of these mechanisms with an explicit BTF ID for each argument
in a function proto. Thanks to btf_struct_ids_match this is very flexible:
check_arg_btf_id can be replaced by requiring struct sock_common.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Martin KaFai Lau f4d0525921 bpf: Add map_meta_equal map ops
Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time.  It limits the use case that
wants to replace a smaller inner map with a larger inner map.  There are
some maps do use max_entries during verification though.  For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops.  Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists.  This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops.  It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch.  A later patch will
relax the max_entries check for most maps.  bpf_types.h
counts 28 map types.  This patch adds 23 ".map_meta_equal"
by using coccinelle.  -5 for
	BPF_MAP_TYPE_PROG_ARRAY
	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
	BPF_MAP_TYPE_STRUCT_OPS
	BPF_MAP_TYPE_ARRAY_OF_MAPS
	BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 15:41:30 +02:00
KP Singh 8ea636848a bpf: Implement bpf_local_storage for inodes
Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org
2020-08-25 15:00:04 -07:00