Commit Graph

122 Commits

Author SHA1 Message Date
Jerome Marchand 4fc465ced2 bpf: Check percpu map value size first
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 1d244784be6b01162b732a5a7d637dfc024c3203
Author: Tao Chen <chen.dylane@gmail.com>
Date:   Tue Sep 10 22:41:10 2024 +0800

    bpf: Check percpu map value size first

    Percpu map is often used, but the map value size limit often ignored,
    like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
    percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
    can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
    like percpu map of local_storage. Maybe the error message seems clearer
    compared with "cannot allocate memory".

    Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
    Signed-off-by: Tao Chen <chen.dylane@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:05 +01:00
Jerome Marchand a11edfe6ee bpf: Fix percpu address space issues
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 6d641ca50d7ec7d5e4e889c3f8ea22afebc2a403
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Sun Aug 11 18:13:33 2024 +0200

    bpf: Fix percpu address space issues

    In arraymap.c:

    In bpf_array_map_seq_start() and bpf_array_map_seq_next()
    cast return values from the __percpu address space to
    the generic address space via uintptr_t [1].

    Correct the declaration of pptr pointer in __bpf_array_map_seq_show()
    to void __percpu * and cast the value from the generic address
    space to the __percpu address space via uintptr_t [1].

    In hashtab.c:

    Assign the return value from bpf_mem_cache_alloc() to void pointer
    and cast the value to void __percpu ** (void pointer to percpu void
    pointer) before dereferencing.

    In memalloc.c:

    Explicitly declare __percpu variables.

    Cast obj to void __percpu **.

    In helpers.c:

    Cast ptr in BPF_CALL_1 and BPF_CALL_2 from generic address space
    to __percpu address space via const uintptr_t [1].

    Found by GCC's named address space checks.

    There were no changes in the resulting object files.

    [1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name

    Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Martin KaFai Lau <martin.lau@linux.dev>
    Cc: Eduard Zingerman <eddyz87@gmail.com>
    Cc: Song Liu <song@kernel.org>
    Cc: Yonghong Song <yonghong.song@linux.dev>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Cc: Stanislav Fomichev <sdf@fomichev.me>
    Cc: Hao Luo <haoluo@google.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/r/20240811161414.56744-1-ubizjak@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:24:25 +01:00
Jerome Marchand cc024fa55c bpf: Replace 8 seq_puts() calls by seq_putc() calls
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit df862de41fcde6a0a4906647b0cacec2a8db5cf3
Author: Markus Elfring <elfring@users.sourceforge.net>
Date:   Sun Jul 14 16:15:34 2024 +0200

    bpf: Replace 8 seq_puts() calls by seq_putc() calls

    Single line breaks should occasionally be put into a sequence.
    Thus use the corresponding function “seq_putc”.

    This issue was transformed by using the Coccinelle software.

    Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/e26b7df9-cd63-491f-85e8-8cabe60a85e5@web.de

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-13 17:36:12 +01:00
Viktor Malik 78d4b4bd5b
bpf: Do not walk twice the map on free
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit b98a5c68ccaa94e93b9e898091fe2cf21c1500e6
Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Tue Apr 30 12:43:24 2024 +0200

    bpf: Do not walk twice the map on free
    
    If someone stores both a timer and a workqueue in a map, on free
    we would walk it twice.
    
    Add a check in array_map_free_timers_wq and free the timers and
    workqueues if they are present.
    
    Fixes: 246331e3f1ea ("bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps")
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/bpf/20240430-bpf-next-v3-1-27afe7f3b17c@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:55 +01:00
Viktor Malik 1969d68a5e
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 246331e3f1eac905170a923f0ec76725c2558232
Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Sat Apr 20 11:09:09 2024 +0200

    bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
    
    Currently bpf_wq_cancel_and_free() is just a placeholder as there is
    no memory allocation for bpf_wq just yet.
    
    Again, duplication of the bpf_timer approach
    
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-9-6c986a5a741f@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:50 +01:00
Viktor Malik c070d3fe4b
bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit db69718b8efac802c7cc20d5a6c7dfc913f99c43
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Mon Apr 1 19:13:04 2024 -0700

    bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps
    
    Using new per-CPU BPF instruction implement inlining for per-CPU ARRAY
    map lookup helper, if BPF JIT support is present.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/r/20240402021307.1012571-4-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:43 +01:00
Jerome Marchand 614f33dc0a bpf: Consistently use BPF token throughout BPF verifier logic
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit d79a3549754725bb90e58104417449edddf3da3d
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:05 2024 -0800

    bpf: Consistently use BPF token throughout BPF verifier logic

    Remove remaining direct queries to perfmon_capable() and bpf_capable()
    in BPF verifier logic and instead use BPF token (if available) to make
    decisions about privileges.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-9-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Viktor Malik 9680ef97a0
Revert BPF token-related functionality
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit d17aff807f845cf93926c28705216639c7279110
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Dec 19 07:37:35 2023 -0800

    Revert BPF token-related functionality

    This patch includes the following revert (one  conflicting BPF FS
    patch and three token patch sets, represented by merge commits):
      - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
      - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
      - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
      - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

    Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 11:07:29 +02:00
Viktor Malik f5b0387966
bpf: Use GFP_KERNEL in bpf_event_entry_gen()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit dc68540913ac523b46ebda3843cec179362c7a72
Author: Hou Tao <houtao1@huawei.com>
Date:   Thu Dec 14 12:30:10 2023 +0800

    bpf: Use GFP_KERNEL in bpf_event_entry_gen()
    
    rcu_read_lock() is no longer held when invoking bpf_event_entry_gen()
    which is called by perf_event_fd_array_get_ptr(), so using GFP_KERNEL
    instead of GFP_ATOMIC to reduce the possibility of failures due to
    out-of-memory.
    
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231214043010.3458072-3-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:28 +02:00
Viktor Malik 5d4d69907a
bpf: consistently use BPF token throughout BPF verifier logic
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 8062fb12de99b2da33754c6a3be1bfc30d9a35f4
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:20 2023 -0800

    bpf: consistently use BPF token throughout BPF verifier logic
    
    Remove remaining direct queries to perfmon_capable() and bpf_capable()
    in BPF verifier logic and instead use BPF token (if available) to make
    decisions about privileges.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-9-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:10 +02:00
Viktor Malik da83f1285e
bpf: Set need_defer as false when clearing fd array during map free
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 79d93b3c6ffd79abcd8e43345980aa1e904879c4
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Dec 4 22:04:21 2023 +0800

    bpf: Set need_defer as false when clearing fd array during map free
    
    Both map deletion operation, map release and map free operation use
    fd_array_map_delete_elem() to remove the element from fd array and
    need_defer is always true in fd_array_map_delete_elem(). For the map
    deletion operation and map release operation, need_defer=true is
    necessary, because the bpf program, which accesses the element in fd
    array, may still alive. However for map free operation, it is certain
    that the bpf program which owns the fd array has already been exited, so
    setting need_defer as false is appropriate for map free operation.
    
    So fix it by adding need_defer parameter to bpf_fd_array_map_clear() and
    adding a new helper __fd_array_map_delete_elem() to handle the map
    deletion, map release and map free operations correspondingly.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231204140425.1480317-4-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:04 +02:00
Viktor Malik 338f980539
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 20c20bd11a0702ce4dc9300c3da58acf551d9725
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Dec 4 22:04:20 2023 +0800

    bpf: Add map and need_defer parameters to .map_fd_put_ptr()
    
    map is the pointer of outer map, and need_defer needs some explanation.
    need_defer tells the implementation to defer the reference release of
    the passed element and ensure that the element is still alive before
    the bpf program, which may manipulate it, exits.
    
    The following three cases will invoke map_fd_put_ptr() and different
    need_defer values will be passed to these callers:
    
    1) release the reference of the old element in the map during map update
       or map deletion. The release must be deferred, otherwise the bpf
       program may incur use-after-free problem, so need_defer needs to be
       true.
    2) release the reference of the to-be-added element in the error path of
       map update. The to-be-added element is not visible to any bpf
       program, so it is OK to pass false for need_defer parameter.
    3) release the references of all elements in the map during map release.
       Any bpf program which has access to the map must have been exited and
       released, so need_defer=false will be OK.
    
    These two parameters will be used by the following patches to fix the
    potential use-after-free problem for map-in-map.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:04 +02:00
Artem Savkov fb0a7b0e48 bpf: Fix prog_array_map_poke_run map poke update
JIRA: https://issues.redhat.com/browse/RHEL-23643

Conflicts: already backported fd5d27b701883 ("arch/x86: Implement
           arch_bpf_stack_walk")

commit 4b7de801606e504e69689df71475d27e35336fb3
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Dec 6 09:30:40 2023 +0100

    bpf: Fix prog_array_map_poke_run map poke update

    Lee pointed out issue found by syscaller [0] hitting BUG in prog array
    map poke update in prog_array_map_poke_run function due to error value
    returned from bpf_arch_text_poke function.

    There's race window where bpf_arch_text_poke can fail due to missing
    bpf program kallsym symbols, which is accounted for with check for
    -EINVAL in that BUG_ON call.

    The problem is that in such case we won't update the tail call jump
    and cause imbalance for the next tail call update check which will
    fail with -EBUSY in bpf_arch_text_poke.

    I'm hitting following race during the program load:

      CPU 0                             CPU 1

      bpf_prog_load
        bpf_check
          do_misc_fixups
            prog_array_map_poke_track

                                        map_update_elem
                                          bpf_fd_array_map_update_elem
                                            prog_array_map_poke_run

                                              bpf_arch_text_poke returns -EINVAL

        bpf_prog_kallsyms_add

    After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
    poke update fails on expected jump instruction check in bpf_arch_text_poke
    with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

    Similar race exists on the program unload.

    Fixing this by moving the update to bpf_arch_poke_desc_update function which
    makes sure we call __bpf_arch_text_poke that skips the bpf address check.

    Each architecture has slightly different approach wrt looking up bpf address
    in bpf_arch_text_poke, so instead of splitting the function or adding new
    'checkip' argument in previous version, it seems best to move the whole
    map_poke_run update as arch specific code.

      [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

    Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
    Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Cc: Lee Jones <lee@kernel.org>
    Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:33:51 +01:00
Artem Savkov 128dd7c7f8 bpf: return long from bpf_map_ops funcs
Bugzilla: https://bugzilla.redhat.com/2221599

commit d7ba4cc900bf1eea2d8c807c6b1fc6bd61f41237
Author: JP Kobryn <inwardvessel@gmail.com>
Date:   Wed Mar 22 12:47:54 2023 -0700

    bpf: return long from bpf_map_ops funcs
    
    This patch changes the return types of bpf_map_ops functions to long, where
    previously int was returned. Using long allows for bpf programs to maintain
    the sign bit in the absence of sign extension during situations where
    inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
    error is returned.
    
    The definitions of the helper funcs are generated from comments in the bpf
    uapi header at `include/uapi/linux/bpf.h`. The return type of these
    helpers was previously changed from int to long in commit bdb7b79b4c. For
    any case where one of the map helpers call the bpf_map_ops funcs that are
    still returning 32-bit int, a compiler might not include sign extension
    instructions to properly convert the 32-bit negative value a 64-bit
    negative value.
    
    For example:
    bpf assembly excerpt of an inlined helper calling a kernel function and
    checking for a specific error:
    
    ; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
      ...
      46:	call   0xffffffffe103291c	; htab_map_update_elem
    ; if (err && err != -EEXIST) {
      4b:	cmp    $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
    
    kernel function assembly excerpt of return value from
    `htab_map_update_elem` returning 32-bit int:
    
    movl $0xffffffef, %r9d
    ...
    movl %r9d, %eax
    
    ...results in the comparison:
    cmp $0xffffffffffffffef, $0x00000000ffffffef
    
    Fixes: bdb7b79b4c ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
    Tested-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
    Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:19 +02:00
Artem Savkov 6f006c3a4f bpf: arraymap memory usage
Bugzilla: https://bugzilla.redhat.com/2221599

commit 1746d0555a8795c7ecfeab6a2c3e3c824cf57535
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Mar 5 12:46:01 2023 +0000

    bpf: arraymap memory usage
    
    Introduce array_map_mem_usage() to calculate arraymap memory usage. In
    this helper, some small memory allocations are ignored, like the
    allocation of struct bpf_array_aux in prog_array. The inner_map_meta in
    array_of_map is also ignored.
    
    The result as follows,
    
    - before
    11: array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524288B
    12: percpu_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 8912896B
    13: perf_event_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524288B
    14: prog_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524288B
    15: cgroup_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524288B
    
    - after
    11: array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524608B
    12: percpu_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 17301824B
    13: perf_event_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524608B
    14: prog_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524608B
    15: cgroup_array  name count_map  flags 0x0
            key 4B  value 4B  max_entries 65536  memlock 524608B
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20230305124615.12358-5-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:12 +02:00
Jerome Marchand 1a6464875f bpf: Do btf_record_free outside map_free callback
Bugzilla: https://bugzilla.redhat.com/2177177

commit d7f5ef653c3dd0c0d649cae6ef2708053bb1fb2b
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 18 07:25:52 2022 +0530

    bpf: Do btf_record_free outside map_free callback

    Since the commit being fixed, we now miss freeing btf_record for local
    storage maps which will have a btf_record populated in case they have
    bpf_spin_lock element.

    This was missed because I made the choice of offloading the job to free
    kptr_off_tab (now btf_record) to the map_free callback when adding
    support for kptrs.

    Revisiting the reason for this decision, there is the possibility that
    the btf_record gets used inside map_free callback (e.g. in case of maps
    embedding kptrs) to iterate over them and free them, hence doing it
    before the map_free callback would be leaking special field memory, and
    do invalid memory access. The btf_record keeps module references which
    is critical to ensure the dtor call made for referenced kptr is safe to
    do.

    If doing it after map_free callback, the map area is already freed, so
    we cannot access bpf_map structure anymore.

    To fix this and prevent such lapses in future, move bpf_map_free_record
    out of the map_free callback, and do it after map_free by remembering
    the btf_record pointer. There is no need to access bpf_map structure in
    that case, and we can avoid missing this case when support for new map
    types is added for other special fields.

    Since a btf_record and its btf_field_offs are used together, for
    consistency delay freeing of field_offs as well. While not a problem
    right now, a lot of code assumes that either both record and field_offs
    are set or none at once.

    Note that in case of map of maps (outer maps), inner_map_meta->record is
    only used during verification, not to free fields in map value, hence we
    simply keep the bpf_map_free_record call as is in bpf_map_meta_free and
    never touch map->inner_map_meta in bpf_map_free_deferred.

    Add a comment making note of these details.

    Fixes: db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record")
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221118015614.2013203-3-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:06 +02:00
Jerome Marchand 2b8a340165 bpf: Consolidate spin_lock, timer management into btf_record
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Context change from already backported commit 997849c4b969
("bpf: Zeroing allocated object from slab in bpf memory allocator")

commit db559117828d2448fe81ada051c60bcf39f822e9
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 4 00:39:56 2022 +0530

    bpf: Consolidate spin_lock, timer management into btf_record

    Now that kptr_off_tab has been refactored into btf_record, and can hold
    more than one specific field type, accomodate bpf_spin_lock and
    bpf_timer as well.

    While they don't require any more metadata than offset, having all
    special fields in one place allows us to share the same code for
    allocated user defined types and handle both map values and these
    allocated objects in a similar fashion.

    As an optimization, we still keep spin_lock_off and timer_off offsets in
    the btf_record structure, just to avoid having to find the btf_field
    struct each time their offset is needed. This is mostly needed to
    manipulate such objects in a map value at runtime. It's ok to hardcode
    just one offset as more than one field is disallowed.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:01 +02:00
Jerome Marchand 40100e4a5a bpf: Refactor kptr_off_tab into btf_record
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts:
 - Context change from already backported commit 997849c4b969 ("bpf:
Zeroing allocated object from slab in bpf memory allocator")
 - Minor changes from already backported commit 1f6e04a1c7b8 ("bpf:
Fix offset calculation error in __copy_map_value and zero_map_value")

commit aa3496accc412b3d975e4ee5d06076d73394d8b5
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Fri Nov 4 00:39:55 2022 +0530

    bpf: Refactor kptr_off_tab into btf_record

    To prepare the BPF verifier to handle special fields in both map values
    and program allocated types coming from program BTF, we need to refactor
    the kptr_off_tab handling code into something more generic and reusable
    across both cases to avoid code duplication.

    Later patches also require passing this data to helpers at runtime, so
    that they can work on user defined types, initialize them, destruct
    them, etc.

    The main observation is that both map values and such allocated types
    point to a type in program BTF, hence they can be handled similarly. We
    can prepare a field metadata table for both cases and store them in
    struct bpf_map or struct btf depending on the use case.

    Hence, refactor the code into generic btf_record and btf_field member
    structs. The btf_record represents the fields of a specific btf_type in
    user BTF. The cnt indicates the number of special fields we successfully
    recognized, and field_mask is a bitmask of fields that were found, to
    enable quick determination of availability of a certain field.

    Subsequently, refactor the rest of the code to work with these generic
    types, remove assumptions about kptr and kptr_off_tab, rename variables
    to more meaningful names, etc.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:01 +02:00
Artem Savkov 4b73ddbd13 bpf: Support kptrs in percpu arraymap
Bugzilla: https://bugzilla.redhat.com/2166911

commit 6df4ea1ff0ff70798ff1e7eed79f98ccb7b5b0a2
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Sun Sep 4 22:41:15 2022 +0200

    bpf: Support kptrs in percpu arraymap
    
    Enable support for kptrs in percpu BPF arraymap by wiring up the freeing
    of these kptrs from percpu map elements.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20220904204145.3089-3-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:10 +01:00
Artem Savkov 90c770184a bpf: Acquire map uref in .init_seq_private for array map iterator
Bugzilla: https://bugzilla.redhat.com/2137876

commit f76fa6b338055054f80c72b29c97fb95c1becadc
Author: Hou Tao <houtao1@huawei.com>
Date:   Wed Aug 10 16:05:30 2022 +0800

    bpf: Acquire map uref in .init_seq_private for array map iterator
    
    bpf_iter_attach_map() acquires a map uref, and the uref may be released
    before or in the middle of iterating map elements. For example, the uref
    could be released in bpf_iter_detach_map() as part of
    bpf_link_release(), or could be released in bpf_map_put_with_uref() as
    part of bpf_map_release().
    
    Alternative fix is acquiring an extra bpf_link reference just like
    a pinned map iterator does, but it introduces unnecessary dependency
    on bpf_link instead of bpf_map.
    
    So choose another fix: acquiring an extra map uref in .init_seq_private
    for array map iterator.
    
    Fixes: d3cc2ab546 ("bpf: Implement bpf iterator for array maps")
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20220810080538.1845898-2-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:47 +01:00
Artem Savkov 29c6780e09 bpf: remove obsolete KMALLOC_MAX_SIZE restriction on array map value size
Bugzilla: https://bugzilla.redhat.com/2137876

commit 63b8ce77b15ebf69c4b0ef4b87451e2626aa3c43
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Jul 14 22:31:45 2022 -0700

    bpf: remove obsolete KMALLOC_MAX_SIZE restriction on array map value size
    
    Syscall-side map_lookup_elem() and map_update_elem() used to use
    kmalloc() to allocate temporary buffers of value_size, so
    KMALLOC_MAX_SIZE limit on value_size made sense to prevent creation of
    array map that won't be accessible through syscall interface.
    
    But this limitation since has been lifted by relying on kvmalloc() in
    syscall handling code. So remove KMALLOC_MAX_SIZE, which among other
    things means that it's possible to have BPF global variable sections
    (.bss, .data, .rodata) bigger than 8MB now. Keep the sanity check to
    prevent trivial overflows like round_up(map->value_size, 8) and restrict
    value size to <= INT_MAX (2GB).
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20220715053146.1291891-4-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:40 +01:00
Artem Savkov 9c61d13ab3 bpf: make uniform use of array->elem_size everywhere in arraymap.c
Bugzilla: https://bugzilla.redhat.com/2137876

commit d937bc3449fa868cbeaf5c87576f9929b765c1e0
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Jul 14 22:31:44 2022 -0700

    bpf: make uniform use of array->elem_size everywhere in arraymap.c
    
    BPF_MAP_TYPE_ARRAY is rounding value_size to closest multiple of 8 and
    stores that as array->elem_size for various memory allocations and
    accesses.
    
    But the code tends to re-calculate round_up(map->value_size, 8) in
    multiple places instead of using array->elem_size. Cleaning this up and
    making sure we always use array->size to avoid duplication of this
    (admittedly simple) logic for consistency.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20220715053146.1291891-3-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:40 +01:00
Artem Savkov a081befdd7 bpf: fix potential 32-bit overflow when accessing ARRAY map element
Bugzilla: https://bugzilla.redhat.com/2137876

commit 87ac0d600943994444e24382a87aa19acc4cd3d4
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Jul 14 22:31:43 2022 -0700

    bpf: fix potential 32-bit overflow when accessing ARRAY map element
    
    If BPF array map is bigger than 4GB, element pointer calculation can
    overflow because both index and elem_size are u32. Fix this everywhere
    by forcing 64-bit multiplication. Extract this formula into separate
    small helper and use it consistently in various places.
    
    Speculative-preventing formula utilizing index_mask trick is left as is,
    but explicit u64 casts are added in both places.
    
    Fixes: c85d69135a ("bpf: move memory size checks to bpf_map_charge_init()")
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20220715053146.1291891-2-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:40 +01:00
Yauheni Kaliuta c0c280946f bpf: add bpf_map_lookup_percpu_elem for percpu map
Bugzilla: https://bugzilla.redhat.com/2120968

commit 07343110b293456d30393e89b86c4dee1ac051c8
Author: Feng Zhou <zhoufeng.zf@bytedance.com>
Date:   Wed May 11 17:38:53 2022 +0800

    bpf: add bpf_map_lookup_percpu_elem for percpu map
    
    Add new ebpf helpers bpf_map_lookup_percpu_elem.
    
    The implementation method is relatively simple, refer to the implementation
    method of map_lookup_elem of percpu map, increase the parameters of cpu, and
    obtain it according to the specified cpu.
    
    Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
    Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:04 +02:00
Yauheni Kaliuta e52bf4ca67 bpf: Extend batch operations for map-in-map bpf-maps
Bugzilla: https://bugzilla.redhat.com/2120968

commit 9263dddc7b6f816fdd327eee435cc54ba51dd095
Author: Takshak Chahande <ctakshak@fb.com>
Date:   Tue May 10 01:22:20 2022 -0700

    bpf: Extend batch operations for map-in-map bpf-maps
    
    This patch extends batch operations support for map-in-map map-types:
    BPF_MAP_TYPE_HASH_OF_MAPS and BPF_MAP_TYPE_ARRAY_OF_MAPS
    
    A usecase where outer HASH map holds hundred of VIP entries and its
    associated reuse-ports per VIP stored in REUSEPORT_SOCKARRAY type
    inner map, needs to do batch operation for performance gain.
    
    This patch leverages the exiting generic functions for most of the batch
    operations. As map-in-map's value contains the actual reference of the inner map,
    for BPF_MAP_TYPE_HASH_OF_MAPS type, it needed an extra step to fetch the
    map_id from the reference value.
    
    selftests are added in next patch 2/2.
    
    Signed-off-by: Takshak Chahande <ctakshak@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20220510082221.2390540-1-ctakshak@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:02 +02:00
Yauheni Kaliuta 11fec2f10e bpf: Compute map_btf_id during build time
Bugzilla: https://bugzilla.redhat.com/2120968

commit c317ab71facc2cd0a94145973318a4c914e11acc
Author: Menglong Dong <imagedong@tencent.com>
Date:   Mon Apr 25 21:32:47 2022 +0800

    bpf: Compute map_btf_id during build time
    
    For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
    types are computed during vmlinux-btf init:
    
      btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()
    
    It will lookup the btf_type according to the 'map_btf_name' field in
    'struct bpf_map_ops'. This process can be done during build time,
    thanks to Jiri's resolve_btfids.
    
    selftest of map_ptr has passed:
    
      $96 map_ptr:OK
      Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
    
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:00 +02:00
Yauheni Kaliuta 12c4199b33 bpf: Wire up freeing of referenced kptr
Bugzilla: https://bugzilla.redhat.com/2120968

commit 14a324f6a67ef6a53e04362a70160a47eb8afffa
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Mon Apr 25 03:18:55 2022 +0530

    bpf: Wire up freeing of referenced kptr
    
    A destructor kfunc can be defined as void func(type *), where type may
    be void or any other pointer type as per convenience.
    
    In this patch, we ensure that the type is sane and capture the function
    pointer into off_desc of ptr_off_tab for the specific pointer offset,
    with the invariant that the dtor pointer is always set when 'kptr_ref'
    tag is applied to the pointer's pointee type, which is indicated by the
    flag BPF_MAP_VALUE_OFF_F_REF.
    
    Note that only BTF IDs whose destructor kfunc is registered, thus become
    the allowed BTF IDs for embedding as referenced kptr. Hence it serves
    the purpose of finding dtor kfunc BTF ID, as well acting as a check
    against the whitelist of allowed BTF IDs for this purpose.
    
    Finally, wire up the actual freeing of the referenced pointer if any at
    all available offsets, so that no references are leaked after the BPF
    map goes away and the BPF program previously moved the ownership a
    referenced pointer into it.
    
    The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
    will free any existing referenced kptr. The same case is with LRU map's
    bpf_lru_push_free/htab_lru_push_free functions, which are extended to
    reset unreferenced and free referenced kptr.
    
    Note that unlike BPF timers, kptr is not reset or freed when map uref
    drops to zero.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:46:59 +02:00
Jiri Benc d1647a95d0 bpf: generalise tail call map compatibility check
Bugzilla: https://bugzilla.redhat.com/2120966

commit f45d5b6ce2e835834c94b8b700787984f02cd662
Author: Toke Hoiland-Jorgensen <toke@redhat.com>
Date:   Fri Jan 21 11:10:02 2022 +0100

    bpf: generalise tail call map compatibility check

    The check for tail call map compatibility ensures that tail calls only
    happen between maps of the same type. To ensure backwards compatibility for
    XDP frags we need a similar type of check for cpumap and devmap
    programs, so move the state from bpf_array_aux into bpf_map, add
    xdp_has_frags to the check, and apply the same check to cpumap and devmap.

    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:42 +02:00
Yauheni Kaliuta c164813e34 bpf: Replace callers of BPF_CAST_CALL with proper function typedef
Bugzilla: http://bugzilla.redhat.com/2069045

commit 102acbacfd9a96d101abd96d1a7a5bf92b7c3e8e
Author: Kees Cook <keescook@chromium.org>
Date:   Tue Sep 28 16:09:46 2021 -0700

    bpf: Replace callers of BPF_CAST_CALL with proper function typedef
    
    In order to keep ahead of cases in the kernel where Control Flow
    Integrity (CFI) may trip over function call casts, enabling
    -Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
    various warnings and is one of the last places in the kernel
    triggering this warning.
    
    For actual function calls, replace BPF_CAST_CALL() with a typedef, which
    captures the same details about the given function pointers.
    
    This change results in no object code difference.
    
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Link: https://github.com/KSPP/linux/issues/20
    Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
    Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:23:36 +03:00
Jerome Marchand 483ae4a299 bpf: Fix potential race in tail call compatibility check
Bugzilla: http://bugzilla.redhat.com/2041365

commit 54713c85f536048e685258f880bf298a74c3620d
Author: Toke Høiland-Jørgensen <toke@redhat.com>
Date:   Tue Oct 26 13:00:19 2021 +0200

    bpf: Fix potential race in tail call compatibility check

    Lorenzo noticed that the code testing for program type compatibility of
    tail call maps is potentially racy in that two threads could encounter a
    map with an unset type simultaneously and both return true even though they
    are inserting incompatible programs.

    The race window is quite small, but artificially enlarging it by adding a
    usleep_range() inside the check in bpf_prog_array_compatible() makes it
    trivial to trigger from userspace with a program that does, essentially:

            map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
            pid = fork();
            if (pid) {
                    key = 0;
                    value = xdp_fd;
            } else {
                    key = 1;
                    value = tc_fd;
            }
            err = bpf_map_update_elem(map_fd, &key, &value, 0);

    While the race window is small, it has potentially serious ramifications in
    that triggering it would allow a BPF program to tail call to a program of a
    different type. So let's get rid of it by protecting the update with a
    spinlock. The commit in the Fixes tag is the last commit that touches the
    code in question.

    v2:
    - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
    v3:
    - Put lock and the members it protects into an embedded 'owner' struct (Daniel)

    Fixes: 3324b584b6 ("ebpf: misc core cleanup")
    Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
    Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:17:15 +02:00
Jerome Marchand 103c5a16ea bpf: Add map side support for bpf timers.
Bugzilla: http://bugzilla.redhat.com/2041365

commit 68134668c17f31f51930478f75495b552a411550
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Wed Jul 14 17:54:10 2021 -0700

    bpf: Add map side support for bpf timers.

    Restrict bpf timers to array, hash (both preallocated and kmalloced), and
    lru map types. The per-cpu maps with timers don't make sense, since 'struct
    bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that
    the number of timers depends on number of possible cpus and timers would not be
    accessible from all cpus. lpm map support can be added in the future.
    The timers in inner maps are supported.

    The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free
    bpf_timer in a given map element.

    Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate
    that map element indeed contains 'struct bpf_timer'.

    Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when
    map element data is reused in preallocated htab and lru maps.

    Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single
    map element. There could be one of each, but not more than one. Due to 'one
    bpf_timer in one element' restriction do not support timers in global data,
    since global data is a map of single element, but from bpf program side it's
    seen as many global variables and restriction of single global timer would be
    odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with
    timers, since user space could have corrupted mmap element and crashed the
    kernel. The maps with timers cannot be readonly. Due to these restrictions
    search for bpf_timer in datasec BTF in case it was placed in the global data to
    report clear error.

    The previous patch allowed 'struct bpf_timer' as a first field in a map
    element only. Relax this restriction.

    Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free
    the timer when lru map deletes an element as a part of it eviction algorithm.

    Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store.
    The timer operation are done through helpers only.
    This is similar to 'struct bpf_spin_lock'.

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210715005417.78572-5-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:31 +02:00
Pedro Tammela f008d732ab bpf: Add batched ops support for percpu array
Uses the already in-place infrastructure provided by the
'generic_map_*_batch' functions.

No tweak was needed as it transparently handles the percpu variant.

As arrays don't have delete operations, let it return a error to
user space (default behaviour).

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com
2021-04-28 01:17:45 +02:00
Yonghong Song 06dcdcd4b9 bpf: Add arraymap support for bpf_for_each_map_elem() helper
This patch added support for arraymap and percpu arraymap.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204928.3885192-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Roman Gushchin 1bc5975613 bpf: Eliminate rlimit-based memory accounting for arraymap maps
Do not use rlimit-based memory accounting for arraymap maps.
It has been replaced with the memcg-based memory accounting.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201201215900.3569844-19-guro@fb.com
2020-12-02 18:32:46 -08:00
Roman Gushchin 6d192c7938 bpf: Refine memcg-based memory accounting for arraymap maps
Include percpu arrays and auxiliary data into the memcg-based memory
accounting.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201201215900.3569844-9-guro@fb.com
2020-12-02 18:32:45 -08:00
Daniel Borkmann 4a8f87e60f bpf: Allow for map-in-map with dynamic inner array map entries
Recent work in f4d0525921 ("bpf: Add map_meta_equal map ops") and 134fede4ee
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.

We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.

The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.

Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.

Example code generation where inner map is dynamic sized array:

  # bpftool p d x i 125
  int handle__sys_enter(void * ctx):
  ; int handle__sys_enter(void *ctx)
     0: (b4) w1 = 0
  ; int key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r2 = r10
  ;
     3: (07) r2 += -4
  ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
     4: (18) r1 = map[id:468]
     6: (07) r1 += 272
     7: (61) r0 = *(u32 *)(r2 +0)
     8: (35) if r0 >= 0x3 goto pc+5
     9: (67) r0 <<= 3
    10: (0f) r0 += r1
    11: (79) r0 = *(u64 *)(r0 +0)
    12: (15) if r0 == 0x0 goto pc+1
    13: (05) goto pc+1
    14: (b7) r0 = 0
    15: (b4) w6 = -1
  ; if (!inner_map)
    16: (15) if r0 == 0x0 goto pc+6
    17: (bf) r2 = r10
  ;
    18: (07) r2 += -4
  ; val = bpf_map_lookup_elem(inner_map, &key);
    19: (bf) r1 = r0                               | No inlining but instead
    20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
  ; return val ? *val : -1;                        | for inner array lookup.
    21: (15) if r0 == 0x0 goto pc+1
  ; return val ? *val : -1;
    22: (61) r6 = *(u32 *)(r0 +0)
  ; }
    23: (bc) w0 = w6
    24: (95) exit

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
Song Liu 792caccc45 bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
2020-09-30 23:18:12 -07:00
Maciej Fijalkowski ebf7d1f508 bpf, x64: rework pro/epilogue and tailcall handling in JIT
This commit serves two things:
1) it optimizes BPF prologue/epilogue generation
2) it makes possible to have tailcalls within BPF subprogram

Both points are related to each other since without 1), 2) could not be
achieved.

In [1], Alexei says:
"The prologue will look like:
nop5
xor eax,eax  // two new bytes if bpf_tail_call() is used in this
             // function
push rbp
mov rbp, rsp
sub rsp, rounded_stack_depth
push rax // zero init tail_call counter
variable number of push rbx,r13,r14,r15

Then bpf_tail_call will pop variable number rbx,..
and final 'pop rax'
Then 'add rsp, size_of_current_stack_frame'
jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov
rbp, rsp'

This way new function will set its own stack size and will init tail
call
counter with whatever value the parent had.

If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'.
Instead it would need to have 'nop2' in there."

Implement that suggestion.

Since the layout of stack is changed, tail call counter handling can not
rely anymore on popping it to rbx just like it have been handled for
constant prologue case and later overwrite of rbx with actual value of
rbx pushed to stack. Therefore, let's use one of the register (%rcx) that
is considered to be volatile/caller-saved and pop the value of tail call
counter in there in the epilogue.

Drop the BUILD_BUG_ON in emit_prologue and in
emit_bpf_tail_call_indirect where instruction layout is not constant
anymore.

Introduce new poke target, 'tailcall_bypass' to poke descriptor that is
dedicated for skipping the register pops and stack unwind that are
generated right before the actual jump to target program.
For case when the target program is not present, BPF program will skip
the pop instructions and nop5 dedicated for jmpq $target. An example of
such state when only R6 of callee saved registers is used by program:

ffffffffc0513aa1:       e9 0e 00 00 00          jmpq   0xffffffffc0513ab4
ffffffffc0513aa6:       5b                      pop    %rbx
ffffffffc0513aa7:       58                      pop    %rax
ffffffffc0513aa8:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc0513aaf:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0513ab4:       48 89 df                mov    %rbx,%rdi

When target program is inserted, the jump that was there to skip
pops/nop5 will become the nop5, so CPU will go over pops and do the
actual tailcall.

One might ask why there simply can not be pushes after the nop5?
In the following example snippet:

ffffffffc037030c:       48 89 fb                mov    %rdi,%rbx
(...)
ffffffffc0370332:       5b                      pop    %rbx
ffffffffc0370333:       58                      pop    %rax
ffffffffc0370334:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc037033b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0370340:       48 81 ec 00 00 00 00    sub    $0x0,%rsp
ffffffffc0370347:       50                      push   %rax
ffffffffc0370348:       53                      push   %rbx
ffffffffc0370349:       48 89 df                mov    %rbx,%rdi
ffffffffc037034c:       e8 f7 21 00 00          callq  0xffffffffc0372548

There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall
and jump target is not present. ctx is in %rbx register and BPF
subprogram that we will call into on ffffffffc037034c is relying on it,
e.g. it will pick ctx from there. Such code layout is therefore broken
as we would overwrite the content of %rbx with the value that was pushed
on the prologue. That is the reason for the 'bypass' approach.

Special care needs to be taken during the install/update/remove of
tailcall target. In case when target program is not present, the CPU
must not execute the pop instructions that precede the tailcall.

To address that, the following states can be defined:
A nop, unwind, nop
B nop, unwind, tail
C skip, unwind, nop
D skip, unwind, tail

A is forbidden (lead to incorrectness). The state transitions between
tailcall install/update/remove will work as follows:

First install tail call f: C->D->B(f)
 * poke the tailcall, after that get rid of the skip
Update tail call f to f': B(f)->B(f')
 * poke the tailcall (poke->tailcall_target) and do NOT touch the
   poke->tailcall_bypass
Remove tail call: B(f')->C(f')
 * poke->tailcall_bypass is poked back to jump, then we wait the RCU
   grace period so that other programs will finish its execution and
   after that we are safe to remove the poke->tailcall_target
Install new tail call (f''): C(f')->D(f'')->B(f'').
 * same as first step

This way CPU can never be exposed to "unwind, tail" state.

Last but not least, when tailcalls get mixed with bpf2bpf calls, it
would be possible to encounter the endless loop due to clearing the
tailcall counter if for example we would use the tailcall3-like from BPF
selftests program that would be subprogram-based, meaning the tailcall
would be present within the BPF subprogram.

This test, broken down to particular steps, would do:
entry -> set tailcall counter to 0, bump it by 1, tailcall to func0
func0 -> call subprog_tail
(we are NOT skipping the first 11 bytes of prologue and this subprogram
has a tailcall, therefore we clear the counter...)
subprog -> do the same thing as entry

and then loop forever.

To address this, the idea is to go through the call chain of bpf2bpf progs
and look for a tailcall presence throughout whole chain. If we saw a single
tail call then each node in this call chain needs to be marked as a subprog
that can reach the tailcall. We would later feed the JIT with this info
and:
- set eax to 0 only when tailcall is reachable and this is the entry prog
- if tailcall is reachable but there's no tailcall in insns of currently
  JITed prog then push rax anyway, so that it will be possible to
  propagate further down the call chain
- finally if tailcall is reachable, then we need to precede the 'call'
  insn with mov rax, [rbp - (stack_depth + 8)]

Tail call related cases from test_verifier kselftest are also working
fine. Sample BPF programs that utilize tail calls (sockex3, tracex5)
work properly as well.

[1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:55:30 -07:00
Maciej Fijalkowski cf71b174d3 bpf: rename poke descriptor's 'ip' member to 'tailcall_target'
Reflect the actual purpose of poke->ip and rename it to
poke->tailcall_target so that it will not the be confused with another
poke target that will be introduced in next commit.

While at it, do the same thing with poke->ip_stable - rename it to
poke->tailcall_target_stable.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 12:59:31 -07:00
Alexei Starovoitov 1e6c62a882 bpf: Introduce sleepable BPF programs
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.

The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.

There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.

When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();

Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.

This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-28 21:20:33 +02:00
Martin KaFai Lau 134fede4ee bpf: Relax max_entries check for most of the inner map types
Most of the maps do not use max_entries during verification time.
Thus, those map_meta_equal() do not need to enforce max_entries
when it is inserted as an inner map during runtime.  The max_entries
check is removed from the default implementation bpf_map_meta_equal().

The prog_array_map and xsk_map are exception.  Its map_gen_lookup
uses max_entries to generate inline lookup code.  Thus, they will
implement its own map_meta_equal() to enforce max_entries.
Since there are only two cases now, the max_entries check
is not refactored and stays in its own .c file.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com
2020-08-28 15:41:30 +02:00
Martin KaFai Lau f4d0525921 bpf: Add map_meta_equal map ops
Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time.  It limits the use case that
wants to replace a smaller inner map with a larger inner map.  There are
some maps do use max_entries during verification though.  For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops.  Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists.  This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops.  It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch.  A later patch will
relax the max_entries check for most maps.  bpf_types.h
counts 28 map types.  This patch adds 23 ".map_meta_equal"
by using coccinelle.  -5 for
	BPF_MAP_TYPE_PROG_ARRAY
	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
	BPF_MAP_TYPE_STRUCT_OPS
	BPF_MAP_TYPE_ARRAY_OF_MAPS
	BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 15:41:30 +02:00
Yonghong Song d3cc2ab546 bpf: Implement bpf iterator for array maps
The bpf iterators for array and percpu array
are implemented. Similar to hash maps, for percpu
array map, bpf program will receive values
from all cpus.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184115.590532-1-yhs@fb.com
2020-07-25 20:16:33 -07:00
Alexei Starovoitov bba1dc0b55 bpf: Remove redundant synchronize_rcu.
bpf_free_used_maps() or close(map_fd) will trigger map_free callback.
bpf_free_used_maps() is called after bpf prog is no longer executing:
bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps.
Hence there is no need to call synchronize_rcu() to protect map elements.

Note that hash_of_maps and array_of_maps update/delete inner maps via
sys_bpf() that calls maybe_wait_bpf_programs() and synchronize_rcu().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/bpf/20200630043343.53195-2-alexei.starovoitov@gmail.com
2020-07-01 08:07:13 -07:00
Andrey Ignatov 2872e9ac33 bpf: Set map_btf_{name, id} for all map types
Set map_btf_name and map_btf_id for all map types so that map fields can
be accessed by bpf programs.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/a825f808f22af52b018dbe82f1c7d29dab5fc978.1592600985.git.rdna@fb.com
2020-06-22 22:22:58 +02:00
Andrey Ignatov 41c48f3a98 bpf: Support access to bpf map fields
There are multiple use-cases when it's convenient to have access to bpf
map fields, both `struct bpf_map` and map type specific struct-s such as
`struct bpf_array`, `struct bpf_htab`, etc.

For example while working with sock arrays it can be necessary to
calculate the key based on map->max_entries (some_hash % max_entries).
Currently this is solved by communicating max_entries via "out-of-band"
channel, e.g. via additional map with known key to get info about target
map. That works, but is not very convenient and error-prone while
working with many maps.

In other cases necessary data is dynamic (i.e. unknown at loading time)
and it's impossible to get it at all. For example while working with a
hash table it can be convenient to know how much capacity is already
used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case).

At the same time kernel knows this info and can provide it to bpf
program.

Fill this gap by adding support to access bpf map fields from bpf
program for both `struct bpf_map` and map type specific fields.

Support is implemented via btf_struct_access() so that a user can define
their own `struct bpf_map` or map type specific struct in their program
with only necessary fields and preserve_access_index attribute, cast a
map to this struct and use a field.

For example:

	struct bpf_map {
		__u32 max_entries;
	} __attribute__((preserve_access_index));

	struct bpf_array {
		struct bpf_map map;
		__u32 elem_size;
	} __attribute__((preserve_access_index));

	struct {
		__uint(type, BPF_MAP_TYPE_ARRAY);
		__uint(max_entries, 4);
		__type(key, __u32);
		__type(value, __u32);
	} m_array SEC(".maps");

	SEC("cgroup_skb/egress")
	int cg_skb(void *ctx)
	{
		struct bpf_array *array = (struct bpf_array *)&m_array;
		struct bpf_map *map = (struct bpf_map *)&m_array;

		/* .. use map->max_entries or array->map.max_entries .. */
	}

Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock
in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of
corresponding struct. Only reading from map fields is supported.

For btf_struct_access() to work there should be a way to know btf id of
a struct that corresponds to a map type. To get btf id there should be a
way to get a stringified name of map-specific struct, such as
"bpf_array", "bpf_htab", etc for a map type. Two new fields are added to
`struct bpf_map_ops` to handle it:
* .map_btf_name keeps a btf name of a struct returned by map_alloc();
* .map_btf_id is used to cache btf id of that struct.

To make btf ids calculation cheaper they're calculated once while
preparing btf_vmlinux and cached same way as it's done for btf_id field
of `struct bpf_func_proto`

While calculating btf ids, struct names are NOT checked for collision.
Collisions will be checked as a part of the work to prepare btf ids used
in verifier in compile time that should land soon. The only known
collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs
net/core/sock_map.c) was fixed earlier.

Both new fields .map_btf_name and .map_btf_id must be set for a map type
for the feature to work. If neither is set for a map type, verifier will
return ENOTSUPP on a try to access map_ptr of corresponding type. If
just one of them set, it's verifier misconfiguration.

Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for
BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be
supported separately.

The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by
perfmon_capable() so that unpriv programs won't have access to bpf map
fields.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com
2020-06-22 22:22:58 +02:00
David S. Miller da07f52d3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Move the bpf verifier trace check into the new switch statement in
HEAD.

Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 13:48:59 -07:00
Alexei Starovoitov 2c78ee898d bpf: Implement CAP_BPF
Implement permissions as stated in uapi/linux/capability.h
In order to do that the verifier allow_ptr_leaks flag is split
into four flags and they are set as:
  env->allow_ptr_leaks = bpf_allow_ptr_leaks();
  env->bypass_spec_v1 = bpf_bypass_spec_v1();
  env->bypass_spec_v4 = bpf_bypass_spec_v4();
  env->bpf_capable = bpf_capable();

The first three currently equivalent to perfmon_capable(), since leaking kernel
pointers and reading kernel memory via side channel attacks is roughly
equivalent to reading kernel memory with cap_perfmon.

'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and
other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions,
subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the
verifier, run time mitigations in bpf array, and enables indirect variable
access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code
by the verifier.

That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN
will have speculative checks done by the verifier and other spectre mitigation
applied. Such networking BPF program will not be able to leak kernel pointers
and will not be able to access arbitrary kernel memory.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com
2020-05-15 17:29:41 +02:00
Andrii Nakryiko 333291ce50 bpf: Fix bug in mmap() implementation for BPF array map
mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.

Fixes: fc9702273e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com
2020-05-14 12:40:04 -07:00
Brian Vazquez c60f2d2861 bpf: Add lookup and update batch ops to arraymap
This adds the generic batch ops functionality to bpf arraymap, note that
since deletion is not a valid operation for arraymap, only batch and
lookup are added.

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200115184308.162644-5-brianvv@google.com
2020-01-15 14:00:35 -08:00