Commit Graph

329 Commits

Author SHA1 Message Date
Jerome Marchand 9d8cb76404 riscv, bpf: inline bpf_get_smp_processor_id()
JIRA: https://issues.redhat.com/browse/RHEL-63880

Upstream Status: RHEL-Only

This a very partial backport of 2ddec2c80b44 ("riscv, bpf: inline
bpf_get_smp_processor_id()") It doesn't backport any of the riscv
part, only the bpf_jit_inlines_helper_call() that is needed for the
next patch. I marked it RHEL-Only to prevent kerneloscope to believe
that 2ddec2c80b44 has been backported.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-13 17:36:14 +01:00
Jerome Marchand c02008ee1c bpf: Prevent tail call between progs attached to different hooks
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 28ead3eaabc16ecc907cfb71876da028080f6356
Author: Xu Kuohai <xukuohai@huawei.com>
Date:   Fri Jul 19 19:00:53 2024 +0800

    bpf: Prevent tail call between progs attached to different hooks

    bpf progs can be attached to kernel functions, and the attached functions
    can take different parameters or return different return values. If
    prog attached to one kernel function tail calls prog attached to another
    kernel function, the ctx access or return value verification could be
    bypassed.

    For example, if prog1 is attached to func1 which takes only 1 parameter
    and prog2 is attached to func2 which takes two parameters. Since verifier
    assumes the bpf ctx passed to prog2 is constructed based on func2's
    prototype, verifier allows prog2 to access the second parameter from
    the bpf ctx passed to it. The problem is that verifier does not prevent
    prog1 from passing its bpf ctx to prog2 via tail call. In this case,
    the bpf ctx passed to prog2 is constructed from func1 instead of func2,
    that is, the assumption for ctx access verification is bypassed.

    Another example, if BPF LSM prog1 is attached to hook file_alloc_security,
    and BPF LSM prog2 is attached to hook bpf_lsm_audit_rule_known. Verifier
    knows the return value rules for these two hooks, e.g. it is legal for
    bpf_lsm_audit_rule_known to return positive number 1, and it is illegal
    for file_alloc_security to return positive number. So verifier allows
    prog2 to return positive number 1, but does not allow prog1 to return
    positive number. The problem is that verifier does not prevent prog1
    from calling prog2 via tail call. In this case, prog2's return value 1
    will be used as the return value for prog1's hook file_alloc_security.
    That is, the return value rule is bypassed.

    This patch adds restriction for tail call to prevent such bypasses.

    Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
    Link: https://lore.kernel.org/r/20240719110059.797546-4-xukuohai@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-13 17:36:13 +01:00
Viktor Malik 6cb59aa49a
bpf: remove unused parameter in __bpf_free_used_btfs
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit ab224b9ef7c4eaa752752455ea79bd7022209d5d
Author: Rafael Passos <rafael@rcpassos.me>
Date:   Fri Jun 14 23:24:09 2024 -0300

    bpf: remove unused parameter in __bpf_free_used_btfs
    
    Fixes a compiler warning. The __bpf_free_used_btfs function
    was taking an extra unused struct bpf_prog_aux *aux param
    
    Signed-off-by: Rafael Passos <rafael@rcpassos.me>
    Link: https://lore.kernel.org/r/20240615022641.210320-3-rafael@rcpassos.me
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:10 +01:00
Viktor Malik 677861436c
bpf: remove unused parameter in bpf_jit_binary_pack_finalize
JIRA: https://issues.redhat.com/browse/RHEL-30774

Conflicts: omitting bits for unsupported arch (RISC-V)

commit 9919c5c98cb25dbf7e76aadb9beab55a2a25f830
Author: Rafael Passos <rafael@rcpassos.me>
Date:   Fri Jun 14 23:24:08 2024 -0300

    bpf: remove unused parameter in bpf_jit_binary_pack_finalize

    Fixes a compiler warning. the bpf_jit_binary_pack_finalize function
    was taking an extra bpf_prog parameter that went unused.
    This removves it and updates the callers accordingly.

    Signed-off-by: Rafael Passos <rafael@rcpassos.me>
    Link: https://lore.kernel.org/r/20240615022641.210320-2-rafael@rcpassos.me
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:10 +01:00
Viktor Malik 9ead0c6298
bpf: Switch to krealloc_array()
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit a3034872cd90a6881ad4e10ca6d30e1215a99ada
Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Mon Apr 29 15:00:05 2024 +0300

    bpf: Switch to krealloc_array()
    
    Let the krealloc_array() copy the original data and
    check for a multiplication overflow.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/bpf/20240429120005.3539116-1-andriy.shevchenko@linux.intel.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:54 +01:00
Viktor Malik 9943a6f361
bpf: Use struct_size()
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit cb01621b6d91567ac74c8b95e4db731febdbdec3
Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Mon Apr 29 15:13:22 2024 +0300

    bpf: Use struct_size()
    
    Use struct_size() instead of hand writing it.
    This is less verbose and more robust.
    
    Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/bpf/20240429121323.3818497-1-andriy.shevchenko@linux.intel.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:54 +01:00
Viktor Malik 013281ada1
bpf: Fix typos in comments
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit a7de265cb2d849f8986a197499ad58dca0a4f209
Author: Rafael Passos <rafael@rcpassos.me>
Date:   Wed Apr 17 15:49:14 2024 -0300

    bpf: Fix typos in comments
    
    Found the following typos in comments, and fixed them:
    
    s/unpriviledged/unprivileged/
    s/reponsible/responsible/
    s/possiblities/possibilities/
    s/Divison/Division/
    s/precsion/precision/
    s/havea/have a/
    s/reponsible/responsible/
    s/responsibile/responsible/
    s/tigher/tighter/
    s/respecitve/respective/
    
    Signed-off-by: Rafael Passos <rafael@rcpassos.me>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/6af7deb4-bb24-49e8-b3f1-8dd410597337@smtp-relay.sendinblue.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:49 +01:00
Viktor Malik 26bc028714
bpf: Add support for certain atomics in bpf_arena to x86 JIT
JIRA: https://issues.redhat.com/browse/RHEL-30773

Conflicts: There's a conflict with already applied upstream commit
           66e13b615a0ce ("bpf: verifier: prevent userspace memory access").
           It was probably resolved in a merge commit so adjust now to
           align with upstream.

commit d503a04f8bc0c75dc9db9452d8cc79d748afb752
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Fri Apr 5 16:11:33 2024 -0700

    bpf: Add support for certain atomics in bpf_arena to x86 JIT

    Support atomics in bpf_arena that can be JITed as a single x86 instruction.
    Instructions that are JITed as loops are not supported at the moment,
    since they require more complex extable and loop logic.

    JITs can choose to do smarter things with bpf_jit_supports_insn().
    Like arm64 may decide to support all bpf atomics instructions
    when emit_lse_atomic is available and none in ll_sc mode.

    bpf_jit_supports_percpu_insn(), bpf_jit_supports_ptr_xchg() and
    other such callbacks can be replaced with bpf_jit_supports_insn()
    in the future.

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/r/20240405231134.17274-1-alexei.starovoitov@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 14:36:48 +01:00
Viktor Malik 5cbc7fc84e
bpf: add special internal-only MOV instruction to resolve per-CPU addrs
JIRA: https://issues.redhat.com/browse/RHEL-30773

Conflicts: Context changes in arch/x86/net/bpf_jit_comp.c as commits
           1d5f82d9dd47 ("bpf, x86: fix freeing of not-finalized bpf_prog_pack") and
           95acd8817e66 ("bpf, x64: Add predicate for bpf2bpf with tailcalls support in JIT")
           were taken out-of-order.
           Move bpf_jit_supports_subprog_tailcalls() to its correct
           place to align with upstream and prevent future conflics.

commit 7bdbf7446305cb65c510c16d57cde82bc76b234a
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Mon Apr 1 19:13:02 2024 -0700

    bpf: add special internal-only MOV instruction to resolve per-CPU addrs

    Add a new BPF instruction for resolving absolute addresses of per-CPU
    data from their per-CPU offsets. This instruction is internal-only and
    users are not allowed to use them directly. They will only be used for
    internal inlining optimizations for now between BPF verifier and BPF JITs.

    We use a special BPF_MOV | BPF_ALU64 | BPF_X form with insn->off field
    set to BPF_ADDR_PERCPU = -1. I used negative offset value to distinguish
    them from positive ones used by user-exposed instructions.

    Such instruction performs a resolution of a per-CPU offset stored in
    a register to a valid kernel address which can be dereferenced. It is
    useful in any use case where absolute address of a per-CPU data has to
    be resolved (e.g., in inlining bpf_map_lookup_elem()).

    BPF disassembler is also taught to recognize them to support dumping
    final BPF assembly code (non-JIT'ed version).

    Add arch-specific way for BPF JITs to mark support for this instructions.

    This patch also adds support for these instructions in x86-64 BPF JIT.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/r/20240402021307.1012571-2-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:42 +01:00
Viktor Malik 90de664489
bpf: Replace deprecated strncpy with strscpy
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 2e114248e086fb376405ed3f89b220f8586a2541
Author: Justin Stitt <justinstitt@google.com>
Date:   Tue Apr 2 23:52:50 2024 +0000

    bpf: Replace deprecated strncpy with strscpy
    
    strncpy() is deprecated for use on NUL-terminated destination strings
    [1] and as such we should prefer more robust and less ambiguous string
    interfaces.
    
    bpf sym names get looked up and compared/cleaned with various string
    apis. This suggests they need to be NUL-terminated (strncpy() suggests
    this but does not guarantee it).
    
    |	static int compare_symbol_name(const char *name, char *namebuf)
    |	{
    |		cleanup_symbol_name(namebuf);
    |		return strcmp(name, namebuf);
    |	}
    
    |	static void cleanup_symbol_name(char *s)
    |	{
    |		...
    |		res = strstr(s, ".llvm.");
    |		...
    |	}
    
    Use strscpy() as this method guarantees NUL-termination on the
    destination buffer.
    
    This patch also replaces two uses of strncpy() used in log.c. These are
    simple replacements as postfix has been zero-initialized on the stack
    and has source arguments with a size less than the destination's size.
    
    Note that this patch uses the new 2-argument version of strscpy
    introduced in commit e6584c3964f2f ("string: Allow 2-argument strscpy()").
    
    Signed-off-by: Justin Stitt <justinstitt@google.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
    Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
    Link: https://github.com/KSPP/linux/issues/90
    Link: https://lore.kernel.org/bpf/20240402-strncpy-kernel-bpf-core-c-v1-1-7cb07a426e78@google.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:42 +01:00
Viktor Malik e720ccbc11
bpf: Mark bpf prog stack with kmsan_unposion_memory in interpreter mode
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit e8742081db7d01f980c6161ae1e8a1dbc1e30979
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Thu Mar 28 11:58:01 2024 -0700

    bpf: Mark bpf prog stack with kmsan_unposion_memory in interpreter mode
    
    syzbot reported uninit memory usages during map_{lookup,delete}_elem.
    
    ==========
    BUG: KMSAN: uninit-value in __dev_map_lookup_elem kernel/bpf/devmap.c:441 [inline]
    BUG: KMSAN: uninit-value in dev_map_lookup_elem+0xf3/0x170 kernel/bpf/devmap.c:796
    __dev_map_lookup_elem kernel/bpf/devmap.c:441 [inline]
    dev_map_lookup_elem+0xf3/0x170 kernel/bpf/devmap.c:796
    ____bpf_map_lookup_elem kernel/bpf/helpers.c:42 [inline]
    bpf_map_lookup_elem+0x5c/0x80 kernel/bpf/helpers.c:38
    ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997
    __bpf_prog_run256+0xb5/0xe0 kernel/bpf/core.c:2237
    ==========
    
    The reproducer should be in the interpreter mode.
    
    The C reproducer is trying to run the following bpf prog:
    
        0: (18) r0 = 0x0
        2: (18) r1 = map[id:49]
        4: (b7) r8 = 16777216
        5: (7b) *(u64 *)(r10 -8) = r8
        6: (bf) r2 = r10
        7: (07) r2 += -229
                ^^^^^^^^^^
    
        8: (b7) r3 = 8
        9: (b7) r4 = 0
       10: (85) call dev_map_lookup_elem#1543472
       11: (95) exit
    
    It is due to the "void *key" (r2) passed to the helper. bpf allows uninit
    stack memory access for bpf prog with the right privileges. This patch
    uses kmsan_unpoison_memory() to mark the stack as initialized.
    
    This should address different syzbot reports on the uninit "void *key"
    argument during map_{lookup,delete}_elem.
    
    Reported-by: syzbot+603bcd9b0bf1d94dbb9b@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/000000000000f9ce6d061494e694@google.com/
    Reported-by: syzbot+eb02dc7f03dce0ef39f3@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/000000000000a5c69c06147c2238@google.com/
    Reported-by: syzbot+b4e65ca24fd4d0c734c3@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/000000000000ac56fb06143b6cfa@google.com/
    Reported-by: syzbot+d2b113dc9fea5e1d2848@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/0000000000000d69b206142d1ff7@google.com/
    Reported-by: syzbot+1a3cf6f08d68868f9db3@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/0000000000006f876b061478e878@google.com/
    Tested-by: syzbot+1a3cf6f08d68868f9db3@syzkaller.appspotmail.com
    Suggested-by: Yonghong Song <yonghong.song@linux.dev>
    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20240328185801.1843078-1-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:38 +01:00
Viktor Malik 6985f7b84f
bpf: Check return from set_memory_rox()
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit c733239f8f530872a1f80d8c45dcafbaff368737
Author: Christophe Leroy <christophe.leroy@csgroup.eu>
Date:   Sat Mar 16 08:35:41 2024 +0100

    bpf: Check return from set_memory_rox()
    
    arch_protect_bpf_trampoline() and alloc_new_pack() call
    set_memory_rox() which can fail, leading to unprotected memory.
    
    Take into account return from set_memory_rox() function and add
    __must_check flag to arch_protect_bpf_trampoline().
    
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.1710574353.git.christophe.leroy@csgroup.eu
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:29 +01:00
Viktor Malik 0ee6c99ce8
bpf: Take return from set_memory_ro() into account with bpf_prog_lock_ro()
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 7d2cc63eca0c993c99d18893214abf8f85d566d8
Author: Christophe Leroy <christophe.leroy@csgroup.eu>
Date:   Fri Mar 8 06:38:07 2024 +0100

    bpf: Take return from set_memory_ro() into account with bpf_prog_lock_ro()
    
    set_memory_ro() can fail, leaving memory unprotected.
    
    Check its return and take it into account as an error.
    
    Link: https://github.com/KSPP/linux/issues/7
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: linux-hardening@vger.kernel.org <linux-hardening@vger.kernel.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Message-ID: <286def78955e04382b227cb3e4b6ba272a7442e3.1709850515.git.christophe.leroy@csgroup.eu>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:29 +01:00
Jerome Marchand 87ce3fdb74 bpf: verifier: prevent userspace memory access
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 66e13b615a0ce76b785d780ecc9776ba71983629
Author: Puranjay Mohan <puranjay12@gmail.com>
Date:   Wed Apr 24 10:02:08 2024 +0000

    bpf: verifier: prevent userspace memory access

    With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
    thwart invalid memory accesses, the JITs add an exception table entry
    for all such accesses. But in case the src_reg + offset is a userspace
    address, the BPF program might read that memory if the user has
    mapped it.

    Make the verifier add guard instructions around such memory accesses and
    skip the load if the address falls into the userspace region.

    The JITs need to implement bpf_arch_uaddress_limit() to define where
    the userspace addresses end for that architecture or TASK_SIZE is taken
    as default.

    The implementation is as follows:

    REG_AX =  SRC_REG
    if(offset)
    	REG_AX += offset;
    REG_AX >>= 32;
    if (REG_AX <= (uaddress_limit >> 32))
    	DST_REG = 0;
    else
    	DST_REG = *(size *)(SRC_REG + offset);

    Comparing just the upper 32 bits of the load address with the upper
    32 bits of uaddress_limit implies that the values are being aligned down
    to a 4GB boundary before comparison.

    The above means that all loads with address <= uaddress_limit + 4GB are
    skipped. This is acceptable because there is a large hole (much larger
    than 4GB) between userspace and kernel space memory, therefore a
    correctly functioning BPF program should not access this 4GB memory
    above the userspace.

    Let's analyze what this patch does to the following fentry program
    dereferencing an untrusted pointer:

      SEC("fentry/tcp_v4_connect")
      int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
      {
                    *(volatile long *)sk;
                    return 0;
      }

        BPF Program before              |           BPF Program after
        ------------------              |           -----------------

      0: (79) r1 = *(u64 *)(r1 +0)          0: (79) r1 = *(u64 *)(r1 +0)
      -----------------------------------------------------------------------
      1: (79) r1 = *(u64 *)(r1 +0) --\      1: (bf) r11 = r1
      ----------------------------\   \     2: (77) r11 >>= 32
      2: (b7) r0 = 0               \   \    3: (b5) if r11 <= 0x8000 goto pc+2
      3: (95) exit                  \   \-> 4: (79) r1 = *(u64 *)(r1 +0)
                                     \      5: (05) goto pc+1
                                      \     6: (b7) r1 = 0
                                       \--------------------------------------
                                            7: (b7) r0 = 0
                                            8: (95) exit

    As you can see from above, in the best case (off=0), 5 extra instructions
    are emitted.

    Now, we analyze the same program after it has gone through the JITs of
    ARM64 and RISC-V architectures. We follow the single load instruction
    that has the untrusted pointer and see what instrumentation has been
    added around it.

                                    x86-64 JIT
                                    ==========
         JIT's Instrumentation
              (upstream)
         ---------------------

       0:   nopl   0x0(%rax,%rax,1)
       5:   xchg   %ax,%ax
       7:   push   %rbp
       8:   mov    %rsp,%rbp
       b:   mov    0x0(%rdi),%rdi
      ---------------------------------
       f:   movabs $0x800000000000,%r11
      19:   cmp    %r11,%rdi
      1c:   jb     0x000000000000002a
      1e:   mov    %rdi,%r11
      21:   add    $0x0,%r11
      28:   jae    0x000000000000002e
      2a:   xor    %edi,%edi
      2c:   jmp    0x0000000000000032
      2e:   mov    0x0(%rdi),%rdi
      ---------------------------------
      32:   xor    %eax,%eax
      34:   leave
      35:   ret

    The x86-64 JIT already emits some instructions to protect against user
    memory access. This patch doesn't make any changes for the x86-64 JIT.

                                      ARM64 JIT
                                      =========

            No Intrumentation                       Verifier's Instrumentation
               (upstream)                                  (This patch)
            -----------------                       --------------------------

       0:   add     x9, x30, #0x0                0:   add     x9, x30, #0x0
       4:   nop                                  4:   nop
       8:   paciasp                              8:   paciasp
       c:   stp     x29, x30, [sp, #-16]!        c:   stp     x29, x30, [sp, #-16]!
      10:   mov     x29, sp                     10:   mov     x29, sp
      14:   stp     x19, x20, [sp, #-16]!       14:   stp     x19, x20, [sp, #-16]!
      18:   stp     x21, x22, [sp, #-16]!       18:   stp     x21, x22, [sp, #-16]!
      1c:   stp     x25, x26, [sp, #-16]!       1c:   stp     x25, x26, [sp, #-16]!
      20:   stp     x27, x28, [sp, #-16]!       20:   stp     x27, x28, [sp, #-16]!
      24:   mov     x25, sp                     24:   mov     x25, sp
      28:   mov     x26, #0x0                   28:   mov     x26, #0x0
      2c:   sub     x27, x25, #0x0              2c:   sub     x27, x25, #0x0
      30:   sub     sp, sp, #0x0                30:   sub     sp, sp, #0x0
      34:   ldr     x0, [x0]                    34:   ldr     x0, [x0]
    --------------------------------------------------------------------------------
      38:   ldr     x0, [x0] ----------\        38:   add     x9, x0, #0x0
    -----------------------------------\\       3c:   lsr     x9, x9, #32
      3c:   mov     x7, #0x0            \\      40:   cmp     x9, #0x10, lsl #12
      40:   mov     sp, sp               \\     44:   b.ls    0x0000000000000050
      44:   ldp     x27, x28, [sp], #16   \\--> 48:   ldr     x0, [x0]
      48:   ldp     x25, x26, [sp], #16    \    4c:   b       0x0000000000000054
      4c:   ldp     x21, x22, [sp], #16     \   50:   mov     x0, #0x0
      50:   ldp     x19, x20, [sp], #16      \---------------------------------------
      54:   ldp     x29, x30, [sp], #16         54:   mov     x7, #0x0
      58:   add     x0, x7, #0x0                58:   mov     sp, sp
      5c:   autiasp                             5c:   ldp     x27, x28, [sp], #16
      60:   ret                                 60:   ldp     x25, x26, [sp], #16
      64:   nop                                 64:   ldp     x21, x22, [sp], #16
      68:   ldr     x10, 0x0000000000000070     68:   ldp     x19, x20, [sp], #16
      6c:   br      x10                         6c:   ldp     x29, x30, [sp], #16
                                                70:   add     x0, x7, #0x0
                                                74:   autiasp
                                                78:   ret
                                                7c:   nop
                                                80:   ldr     x10, 0x0000000000000088
                                                84:   br      x10

    There are 6 extra instructions added in ARM64 in the best case. This will
    become 7 in the worst case (off != 0).

                               RISC-V JIT (RISCV_ISA_C Disabled)
                               ==========

            No Intrumentation           Verifier's Instrumentation
               (upstream)                      (This patch)
            -----------------           --------------------------

       0:   nop                            0:   nop
       4:   nop                            4:   nop
       8:   li      a6, 33                 8:   li      a6, 33
       c:   addi    sp, sp, -16            c:   addi    sp, sp, -16
      10:   sd      s0, 8(sp)             10:   sd      s0, 8(sp)
      14:   addi    s0, sp, 16            14:   addi    s0, sp, 16
      18:   ld      a0, 0(a0)             18:   ld      a0, 0(a0)
    ---------------------------------------------------------------
      1c:   ld      a0, 0(a0) --\         1c:   mv      t0, a0
    --------------------------\  \        20:   srli    t0, t0, 32
      20:   li      a5, 0      \  \       24:   lui     t1, 4096
      24:   ld      s0, 8(sp)   \  \      28:   sext.w  t1, t1
      28:   addi    sp, sp, 16   \  \     2c:   bgeu    t1, t0, 12
      2c:   sext.w  a0, a5        \  \--> 30:   ld      a0, 0(a0)
      30:   ret                    \      34:   j       8
                                    \     38:   li      a0, 0
                                     \------------------------------
                                          3c:   li      a5, 0
                                          40:   ld      s0, 8(sp)
                                          44:   addi    sp, sp, 16
                                          48:   sext.w  a0, a5
                                          4c:   ret

    There are 7 extra instructions added in RISC-V.

    Fixes: 8008342853 ("bpf, arm64: Add BPF exception tables")
    Reported-by: Breno Leitao <leitao@debian.org>
    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
    Link: https://lore.kernel.org/r/20240424100210.11982-2-puranjay@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:18 +02:00
Jerome Marchand 0803f1f621 bpf: move sleepable flag from bpf_prog_aux to bpf_prog
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 66c8473135c62f478301a0e5b3012f203562dfa6
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Mar 8 16:47:39 2024 -0800

    bpf: move sleepable flag from bpf_prog_aux to bpf_prog

    prog->aux->sleepable is checked very frequently as part of (some) BPF
    program run hot paths. So this extra aux indirection seems wasteful and
    on busy systems might cause unnecessary memory cache misses.

    Let's move sleepable flag into prog itself to eliminate unnecessary
    pointer dereference.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Message-ID: <20240309004739.2961431-1-andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:16 +02:00
Jerome Marchand 4006704c21 bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit d6170e4aaf86424c24ce06e355b4573daa891b17
Author: Puranjay Mohan <puranjay12@gmail.com>
Date:   Mon Mar 11 12:27:22 2024 +0000

    bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()

    On some architectures like ARM64, PMD_SIZE can be really large in some
    configurations. Like with CONFIG_ARM64_64K_PAGES=y the PMD_SIZE is
    512MB.

    Use 2MB * num_possible_nodes() as the size for allocations done through
    the prog pack allocator. On most architectures, PMD_SIZE will be equal
    to 2MB in case of 4KB pages and will be greater than 2MB for bigger page
    sizes.

    Fixes: ea2babac63d4 ("bpf: Simplify bpf_prog_pack_[size|mask]")
    Reported-by: "kernelci.org bot" <bot@kernelci.org>
    Closes: https://lore.kernel.org/all/7e216c88-77ee-47b8-becc-a0f780868d3c@sirena.org.uk/
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202403092219.dhgcuz2G-lkp@intel.com/
    Suggested-by: Song Liu <song@kernel.org>
    Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
    Message-ID: <20240311122722.86232-1-puranjay12@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:16 +02:00
Jerome Marchand 7336c8c74d bpf: Add x86-64 JIT support for bpf_addr_space_cast instruction.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 142fd4d2dcf58b1720a6af644f31de1a5551f219
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Thu Mar 7 17:08:02 2024 -0800

    bpf: Add x86-64 JIT support for bpf_addr_space_cast instruction.

    LLVM generates bpf_addr_space_cast instruction while translating
    pointers between native (zero) address space and
    __attribute__((address_space(N))).
    The addr_space=1 is reserved as bpf_arena address space.

    rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
    converted to normal 32-bit move: wX = wY

    rY = addr_space_cast(rX, 1, 0) has to be converted by JIT:

    aux_reg = upper_32_bits of arena->user_vm_start
    aux_reg <<= 32
    wX = wY // clear upper 32 bits of dst register
    if (wX) // if not zero add upper bits of user_vm_start
      wX |= aux_reg

    JIT can do it more efficiently:

    mov dst_reg32, src_reg32  // 32-bit move
    shl dst_reg, 32
    or dst_reg, user_vm_start
    rol dst_reg, 32
    xor r11, r11
    test dst_reg32, dst_reg32 // check if lower 32-bit are zero
    cmove r11, dst_reg	  // if so, set dst_reg to zero
    			  // Intel swapped src/dst register encoding in CMOVcc

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/bpf/20240308010812.89848-5-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:15 +02:00
Jerome Marchand a245935eb8 bpf: Introduce bpf_arena.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 317460317a02a1af512697e6e964298dedd8a163
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Thu Mar 7 17:07:59 2024 -0800

    bpf: Introduce bpf_arena.

    Introduce bpf_arena, which is a sparse shared memory region between the bpf
    program and user space.

    Use cases:
    1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
       anonymous region, like memcached or any key/value storage. The bpf
       program implements an in-kernel accelerator. XDP prog can search for
       a key in bpf_arena and return a value without going to user space.
    2. The bpf program builds arbitrary data structures in bpf_arena (hash
       tables, rb-trees, sparse arrays), while user space consumes it.
    3. bpf_arena is a "heap" of memory from the bpf program's point of view.
       The user space may mmap it, but bpf program will not convert pointers
       to user base at run-time to improve bpf program speed.

    Initially, the kernel vm_area and user vma are not populated. User space
    can fault in pages within the range. While servicing a page fault,
    bpf_arena logic will insert a new page into the kernel and user vmas. The
    bpf program can allocate pages from that region via
    bpf_arena_alloc_pages(). This kernel function will insert pages into the
    kernel vm_area. The subsequent fault-in from user space will populate that
    page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
    can be used to prevent fault-in from user space. In such a case, if a page
    is not allocated by the bpf program and not present in the kernel vm_area,
    the user process will segfault. This is useful for use cases 2 and 3 above.

    bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
    either at a specific address within the arena or allocates a range with the
    maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
    pages and removes the range from the kernel vm_area and from user process
    vmas.

    bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
    bpf program is more important than ease of sharing with user space. This is
    use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
    It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
    instruction as a 32-bit move wX = wY, which will improve bpf prog
    performance. Otherwise, bpf_arena_cast_user is translated by JIT to
    conditionally add the upper 32 bits of user vm_start (if the pointer is not
    NULL) to arena pointers before they are stored into memory. This way, user
    space sees them as valid 64-bit pointers.

    Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
    backend generate the bpf_addr_space_cast() instruction to cast pointers
    between address_space(1) which is reserved for bpf_arena pointers and
    default address space zero. All arena pointers in a bpf program written in
    C language are tagged as __attribute__((address_space(1))). Hence, clang
    provides helpful diagnostics when pointers cross address space. Libbpf and
    the kernel support only address_space == 1. All other address space
    identifiers are reserved.

    rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
    verifier that rX->type = PTR_TO_ARENA. Any further operations on
    PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
    mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
    them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
    to copy_from_kernel_nofault() except that no address checks are necessary.
    The address is guaranteed to be in the 4GB range. If the page is not
    present, the destination register is zeroed on read, and the operation is
    ignored on write.

    rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
    unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
    verifier converts such cast instructions to mov32. Otherwise, JIT will emit
    native code equivalent to:
    rX = (u32)rY;
    if (rY)
      rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

    After such conversion, the pointer becomes a valid user pointer within
    bpf_arena range. The user process can access data structures created in
    bpf_arena without any additional computations. For example, a linked list
    built by a bpf program can be walked natively by user space.

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Barret Rhoden <brho@google.com>
    Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:15 +02:00
Jerome Marchand e55a05bf38 bpf: Tell bpf programs kernel's PAGE_SIZE
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit fe5064158c561b807af5708c868f6c7cb5144e01
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Wed Mar 6 19:12:28 2024 -0800

    bpf: Tell bpf programs kernel's PAGE_SIZE

    vmlinux BTF includes all kernel enums.
    Add __PAGE_SIZE = PAGE_SIZE enum, so that bpf programs
    that include vmlinux.h can easily access it.

    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/r/20240307031228.42896-7-alexei.starovoitov@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:14 +02:00
Jerome Marchand 120f898f72 bpf: Introduce may_goto instruction
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 011832b97b311bb9e3c27945bc0d1089a14209c9
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Tue Mar 5 19:19:26 2024 -0800

    bpf: Introduce may_goto instruction

    Introduce may_goto instruction that from the verifier pov is similar to
    open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
    doesn't iterate any objects.
    In assembly 'may_goto' is a nop most of the time until bpf runtime has to
    terminate the program for whatever reason. In the current implementation
    may_goto has a hidden counter, but other mechanisms can be used.
    For programs written in C the later patch introduces 'cond_break' macro
    that combines 'may_goto' with 'break' statement and has similar semantics:
    cond_break is a nop until bpf runtime has to break out of this loop.
    It can be used in any normal "for" or "while" loop, like

      for (i = zero; i < cnt; cond_break, i++) {

    The verifier recognizes that may_goto is used in the program, reserves
    additional 8 bytes of stack, initializes them in subprog prologue, and
    replaces may_goto instruction with:
    aux_reg = *(u64 *)(fp - 40)
    if aux_reg == 0 goto pc+off
    aux_reg -= 1
    *(u64 *)(fp - 40) = aux_reg

    may_goto instruction can be used by LLVM to implement __builtin_memcpy,
    __builtin_strcmp.

    may_goto is not a full substitute for bpf_for() macro.
    bpf_for() doesn't have induction variable that verifiers sees,
    so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

    But when the code is written as:
    for (i = 0; i < 100; cond_break, i++)
    the verifier see 'i' as precise constant zero,
    hence cond_break (aka may_goto) doesn't help to converge the loop.
    A static or global variable can be used as a workaround:
    static int zero = 0;
    for (i = zero; i < 100; cond_break, i++) // works!

    may_goto works well with arena pointers that don't need to be bounds
    checked on access. Load/store from arena returns imprecise unbounded
    scalar and loops with may_goto pass the verifier.

    Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
    JCOND stands for conditional pseudo jump.
    Since goto_or_nop insn was proposed, it may use the same opcode.
    may_goto vs goto_or_nop can be distinguished by src_reg:
    code = BPF_JMP | BPF_JCOND
    src_reg = 0 - may_goto
    src_reg = 1 - goto_or_nop

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Tested-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:12 +02:00
Jerome Marchand 614f33dc0a bpf: Consistently use BPF token throughout BPF verifier logic
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit d79a3549754725bb90e58104417449edddf3da3d
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:05 2024 -0800

    bpf: Consistently use BPF token throughout BPF verifier logic

    Remove remaining direct queries to perfmon_capable() and bpf_capable()
    in BPF verifier logic and instead use BPF token (if available) to make
    decisions about privileges.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-9-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand 13b9927298 bpf: Add BPF token support to BPF_PROG_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit caf8f28e036c4ba1e823355da6c0c01c39e70ab9
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:03 2024 -0800

    bpf: Add BPF token support to BPF_PROG_LOAD command

    Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
    should be set in prog_flags field when providing prog_token_fd.

    Wire through a set of allowed BPF program types and attach types,
    derived from BPF FS at BPF token creation time. Then make sure we
    perform bpf_token_capable() checks everywhere where it's relevant.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand f0f8ba4f5d bpf: Support inlining bpf_kptr_xchg() helper
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts:
Context change. The bpf_arch_poke_desc_update() function has been
placed before bpf_jit_supports_exceptions() and
bpf_jit_supports_exceptions() by a previous backport (commit
fb0a7b0e48 "bpf: Fix prog_array_map_poke_run map poke update").
Reorder the function as upstream to help with furture backport.

commit 7c05e7f3e74e7e550534d524e04d7e6f78d6fa24
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Jan 5 18:48:17 2024 +0800

    bpf: Support inlining bpf_kptr_xchg() helper

    The motivation of inlining bpf_kptr_xchg() comes from the performance
    profiling of bpf memory allocator benchmark. The benchmark uses
    bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
    objects for free. After inling bpf_kptr_xchg(), the performance for
    object free on 8-CPUs VM increases about 2%~10%. The inline also has
    downside: both the kasan and kcsan checks on the pointer will be
    unavailable.

    bpf_kptr_xchg() can be inlined by converting the calling of
    bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion
    depends on two conditions:
    1) JIT backend supports atomic_xchg() on pointer-sized word
    2) For the specific arch, the implementation of xchg is the same as
       atomic_xchg() on pointer-sized words.

    It seems most 64-bit JIT backends satisfies these two conditions. But
    as a precaution, defining a weak function bpf_jit_supports_ptr_xchg()
    to state whether such conversion is safe and only supporting inline for
    64-bit host.

    For x86-64, it supports BPF_XCHG atomic operation and both xchg() and
    atomic_xchg() use arch_xchg() to implement the exchange, so enabling the
    inline of bpf_kptr_xchg() on x86-64 first.

    Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20240105104819.3916743-2-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:48:58 +02:00
Viktor Malik 9680ef97a0
Revert BPF token-related functionality
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit d17aff807f845cf93926c28705216639c7279110
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Dec 19 07:37:35 2023 -0800

    Revert BPF token-related functionality

    This patch includes the following revert (one  conflicting BPF FS
    patch and three token patch sets, represented by merge commits):
      - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
      - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
      - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
      - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

    Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 11:07:29 +02:00
Viktor Malik 9bfdd0271a
x86/cfi,bpf: Fix BPF JIT call
JIRA: https://issues.redhat.com/browse/RHEL-23644

Conflicts: changed context due to missing upstream commit
           89245600941e4 ("cfi: Switch to -fsanitize=kcfi")

commit 4f9087f16651aca4a5f32da840a53f6660f0579a
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Dec 15 10:12:18 2023 +0100

    x86/cfi,bpf: Fix BPF JIT call

    The current BPF call convention is __nocfi, except when it calls !JIT things,
    then it calls regular C functions.

    It so happens that with FineIBT the __nocfi and C calling conventions are
    incompatible. Specifically __nocfi will call at func+0, while FineIBT will have
    endbr-poison there, which is not a valid indirect target. Causing #CP.

    Notably this only triggers on IBT enabled hardware, which is probably why this
    hasn't been reported (also, most people will have JIT on anyway).

    Implement proper CFI prologues for the BPF JIT codegen and drop __nocfi for
    x86.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20231215092707.345270396@infradead.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:30 +02:00
Viktor Malik ef19b269c7
bpf: Let bpf_prog_pack_free handle any pointer
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit f08a1c658257c73697a819c4ded3a84b6f0ead74
Author: Song Liu <song@kernel.org>
Date:   Wed Dec 6 14:40:48 2023 -0800

    bpf: Let bpf_prog_pack_free handle any pointer
    
    Currently, bpf_prog_pack_free only can only free pointer to struct
    bpf_binary_header, which is not flexible. Add a size argument to
    bpf_prog_pack_free so that it can handle any pointer.
    
    Signed-off-by: Song Liu <song@kernel.org>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
    Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20231206224054.492250-2-song@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:13 +02:00
Viktor Malik 5d4d69907a
bpf: consistently use BPF token throughout BPF verifier logic
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 8062fb12de99b2da33754c6a3be1bfc30d9a35f4
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:20 2023 -0800

    bpf: consistently use BPF token throughout BPF verifier logic
    
    Remove remaining direct queries to perfmon_capable() and bpf_capable()
    in BPF verifier logic and instead use BPF token (if available) to make
    decisions about privileges.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-9-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:10 +02:00
Viktor Malik 1b699c9ae7
bpf: add BPF token support to BPF_PROG_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit e1cef620f598853a90f17701fcb1057a6768f7b8
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:18 2023 -0800

    bpf: add BPF token support to BPF_PROG_LOAD command
    
    Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
    allowed BPF program types and attach types, derived from BPF FS at BPF
    token creation time. Then make sure we perform bpf_token_capable()
    checks everywhere where it's relevant.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:09 +02:00
Viktor Malik a7312d7fb3
bpf: Optimize the free of inner map
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit af66bfd3c8538ed21cf72af18426fc4a408665cf
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Dec 4 22:04:23 2023 +0800

    bpf: Optimize the free of inner map
    
    When removing the inner map from the outer map, the inner map will be
    freed after one RCU grace period and one RCU tasks trace grace
    period, so it is certain that the bpf program, which may access the
    inner map, has exited before the inner map is freed.
    
    However there is no need to wait for one RCU tasks trace grace period if
    the outer map is only accessed by non-sleepable program. So adding
    sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding
    the outer map into env->used_maps for sleepable program. Although the
    max number of bpf program is INT_MAX - 1, the number of bpf programs
    which are being loaded may be greater than INT_MAX, so using atomic64_t
    instead of atomic_t for sleepable_refcnt. When removing the inner map
    from the outer map, using sleepable_refcnt to decide whether or not a
    RCU tasks trace grace period is needed before freeing the inner map.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:04 +02:00
Artem Savkov 0caff7b9fe bpf: Do not allocate percpu memory at init stage
JIRA: https://issues.redhat.com/browse/RHEL-23643

Conflicts: added required include that was added upstream by missing
           commit 680ee0456a571 ("net: invert the netdevice.h vs xdp.h
           dependency")

commit 1fda5bb66ad8fb24ecb3858e61a13a6548428898
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Fri Nov 10 17:39:28 2023 -0800

    bpf: Do not allocate percpu memory at init stage

    Kirill Shutemov reported significant percpu memory consumption increase after
    booting in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for
    non-fix-size percpu mem allocation"). The percpu memory consumption is
    increased from 111MB to 969MB. The number is from /proc/meminfo.

    I tried to reproduce the issue with my local VM which at most supports upto
    255 cpus. With 252 cpus, without the above commit, the percpu memory
    consumption immediately after boot is 57MB while with the above commit the
    percpu memory consumption is 231MB.

    This is not good since so far percpu memory from bpf memory allocator is not
    widely used yet. Let us change pre-allocation in init stage to on-demand
    allocation when verifier detects there is a need of percpu memory for bpf
    program. With this change, percpu memory consumption after boot can be reduced
    signicantly.

      [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/

    Fixes: 41a5db8d8161 ("bpf: Add support for non-fix-size percpu mem allocation")
    Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Acked-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231111013928.948838-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 11:23:42 +01:00
Artem Savkov d2aa9cb614 bpf: Detect IP == ksym.end as part of BPF program
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 66d9111f3517f85ef2af0337ece02683ce0faf21
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:08 2023 +0200

    bpf: Detect IP == ksym.end as part of BPF program
    
    Now that bpf_throw kfunc is the first such call instruction that has
    noreturn semantics within the verifier, this also kicks in dead code
    elimination in unprecedented ways. For one, any instruction following
    a bpf_throw call will never be marked as seen. Moreover, if a callchain
    ends up throwing, any instructions after the call instruction to the
    eventually throwing subprog in callers will also never be marked as
    seen.
    
    The tempting way to fix this would be to emit extra 'int3' instructions
    which bump the jited_len of a program, and ensure that during runtime
    when a program throws, we can discover its boundaries even if the call
    instruction to bpf_throw (or to subprogs that always throw) is emitted
    as the final instruction in the program.
    
    An example of such a program would be this:
    
    do_something():
    	...
    	r0 = 0
    	exit
    
    foo():
    	r1 = 0
    	call bpf_throw
    	r0 = 0
    	exit
    
    bar(cond):
    	if r1 != 0 goto pc+2
    	call do_something
    	exit
    	call foo
    	r0 = 0  // Never seen by verifier
    	exit	//
    
    main(ctx):
    	r1 = ...
    	call bar
    	r0 = 0
    	exit
    
    Here, if we do end up throwing, the stacktrace would be the following:
    
    bpf_throw
    foo
    bar
    main
    
    In bar, the final instruction emitted will be the call to foo, as such,
    the return address will be the subsequent instruction (which the JIT
    emits as int3 on x86). This will end up lying outside the jited_len of
    the program, thus, when unwinding, we will fail to discover the return
    address as belonging to any program and end up in a panic due to the
    unreliable stack unwinding of BPF programs that we never expect.
    
    To remedy this case, make bpf_prog_ksym_find treat IP == ksym.end as
    part of the BPF program, so that is_bpf_text_address returns true when
    such a case occurs, and we are able to unwind reliably when the final
    instruction ends up being a call instruction.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-12-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov bd01d7114b bpf: Implement BPF exceptions
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit f18b03fabaa9b7c80e80b72a621f481f0d706ae0
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:01 2023 +0200

    bpf: Implement BPF exceptions
    
    This patch implements BPF exceptions, and introduces a bpf_throw kfunc
    to allow programs to throw exceptions during their execution at runtime.
    A bpf_throw invocation is treated as an immediate termination of the
    program, returning back to its caller within the kernel, unwinding all
    stack frames.
    
    This allows the program to simplify its implementation, by testing for
    runtime conditions which the verifier has no visibility into, and assert
    that they are true. In case they are not, the program can simply throw
    an exception from the other branch.
    
    BPF exceptions are explicitly *NOT* an unlikely slowpath error handling
    primitive, and this objective has guided design choices of the
    implementation of the them within the kernel (with the bulk of the cost
    for unwinding the stack offloaded to the bpf_throw kfunc).
    
    The implementation of this mechanism requires use of add_hidden_subprog
    mechanism introduced in the previous patch, which generates a couple of
    instructions to move R1 to R0 and exit. The JIT then rewrites the
    prologue of this subprog to take the stack pointer and frame pointer as
    inputs and reset the stack frame, popping all callee-saved registers
    saved by the main subprog. The bpf_throw function then walks the stack
    at runtime, and invokes this exception subprog with the stack and frame
    pointers as parameters.
    
    Reviewers must take note that currently the main program is made to save
    all callee-saved registers on x86_64 during entry into the program. This
    is because we must do an equivalent of a lightweight context switch when
    unwinding the stack, therefore we need the callee-saved registers of the
    caller of the BPF program to be able to return with a sane state.
    
    Note that we have to additionally handle r12, even though it is not used
    by the program, because when throwing the exception the program makes an
    entry into the kernel which could clobber r12 after saving it on the
    stack. To be able to preserve the value we received on program entry, we
    push r12 and restore it from the generated subprogram when unwinding the
    stack.
    
    For now, bpf_throw invocation fails when lingering resources or locks
    exist in that path of the program. In a future followup, bpf_throw will
    be extended to perform frame-by-frame unwinding to release lingering
    resources for each stack frame, removing this limitation.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-5-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov f2a52b618d bpf: Implement support for adding hidden subprogs
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 335d1c5b545284d75ef96ee42e461eacefe865bb
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:32:00 2023 +0200

    bpf: Implement support for adding hidden subprogs
    
    Introduce support in the verifier for generating a subprogram and
    include it as part of a BPF program dynamically after the do_check phase
    is complete. The first user will be the next patch which generates
    default exception callbacks if none are set for the program. The phase
    of invocation will be do_misc_fixups. Note that this is an internal
    verifier function, and should be used with instruction blocks which
    uphold the invariants stated in check_subprogs.
    
    Since these subprogs are always appended to the end of the instruction
    sequence of the program, it becomes relatively inexpensive to do the
    related adjustments to the subprog_info of the program. Only the fake
    exit subprogram is shifted forward, making room for our new subprog.
    
    This is useful to insert a new subprogram, get it JITed, and obtain its
    function pointer. The next patch will use this functionality to insert a
    default exception callback which will be invoked after unwinding the
    stack.
    
    Note that these added subprograms are invisible to userspace, and never
    reported in BPF_OBJ_GET_INFO_BY_ID etc. For now, only a single
    subprogram is supported, but more can be easily supported in the future.
    
    To this end, two function counts are introduced now, the existing
    func_cnt, and real_func_cnt, the latter including hidden programs. This
    allows us to conver the JIT code to use the real_func_cnt for management
    of resources while syscall path continues working with existing
    func_cnt.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-4-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:47 +01:00
Artem Savkov 640f482f09 bpf: Add support for non-fix-size percpu mem allocation
JIRA: https://issues.redhat.com/browse/RHEL-23643

commit 41a5db8d8161457b121a03fde999ff6e00090ee2
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Sun Aug 27 08:27:34 2023 -0700

    bpf: Add support for non-fix-size percpu mem allocation
    
    This is needed for later percpu mem allocation when the
    allocation is done by bpf program. For such cases, a global
    bpf_global_percpu_ma is added where a flexible allocation
    size is needed.
    
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230827152734.1995725-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 10:27:45 +01:00
Prarit Bhargava d4c98c8e5c arch/x86: Implement arch_bpf_stack_walk
JIRA: https://issues.redhat.com/browse/RHEL-25415

Conflicts: Minor drift issues.

commit fd5d27b70188379bb441d404c29a0afb111e1753
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Date:   Wed Sep 13 01:31:59 2023 +0200

    arch/x86: Implement arch_bpf_stack_walk

    The plumbing for offline unwinding when we throw an exception in
    programs would require walking the stack, hence introduce a new
    arch_bpf_stack_walk function. This is provided when the JIT supports
    exceptions, i.e. bpf_jit_supports_exceptions is true. The arch-specific
    code is really minimal, hence it should be straightforward to extend
    this support to other architectures as well, as it reuses the logic of
    arch_stack_walk, but allowing access to unwind_state data.

    Once the stack pointer and frame pointer are known for the main subprog
    during the unwinding, we know the stack layout and location of any
    callee-saved registers which must be restored before we return back to
    the kernel. This handling will be added in the subsequent patches.

    Note that while we primarily unwind through BPF frames, which are
    effectively CONFIG_UNWINDER_FRAME_POINTER, we still need one of this or
    CONFIG_UNWINDER_ORC to be able to unwind through the bpf_throw frame
    from which we begin walking the stack. We also require both sp and bp
    (stack and frame pointers) from the unwind_state structure, which are
    only available when one of these two options are enabled.

    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20230912233214.1518551-3-memxor@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:43:24 -04:00
Prarit Bhargava b79789281d mm: Introduce set_memory_rox()
JIRA: https://issues.redhat.com/browse/RHEL-25415

Conflicts: Minor drift issues, and not worried about unsupported arches.
Changes to arch/arm/mach-omap[12] are made in arch/arm/plat-omap which
is unified in RHEL9.

commit d48567c9a0d1e605639f8a8705a61bbb55fb4e84
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Oct 26 12:13:03 2022 +0200

    mm: Introduce set_memory_rox()

    Because endlessly repeating:

            set_memory_ro()
            set_memory_x()

    is getting tedious.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2024-03-20 09:42:51 -04:00
Jerome Marchand 53072d3679 bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit dfce9cb3140592b886838e06f3e0c25fea2a9cae
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Nov 30 18:46:40 2023 -0800

    bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4

    Bpf cpu=v4 support is introduced in [1] and Commit 4cd58e9af8b9
    ("bpf: Support new 32bit offset jmp instruction") added support for new
    32bit offset jmp instruction. Unfortunately, in function
    bpf_adj_delta_to_off(), for new branch insn with 32bit offset, the offset
    (plus/minor a small delta) compares to 16-bit offset bound
    [S16_MIN, S16_MAX], which caused the following verification failure:
      $ ./test_progs-cpuv4 -t verif_scale_pyperf180
      ...
      insn 10 cannot be patched due to 16-bit range
      ...
      libbpf: failed to load object 'pyperf180.bpf.o'
      scale_test:FAIL:expect_success unexpected error: -12 (errno 12)
      #405     verif_scale_pyperf180:FAIL

    Note that due to recent llvm18 development, the patch [2] (already applied
    in bpf-next) needs to be applied to bpf tree for testing purpose.

    The fix is rather simple. For 32bit offset branch insn, the adjusted
    offset compares to [S32_MIN, S32_MAX] and then verification succeeded.

      [1] https://lore.kernel.org/all/20230728011143.3710005-1-yonghong.song@linux.dev
      [2] https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev

    Fixes: 4cd58e9af8b9 ("bpf: Support new 32bit offset jmp instruction")
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20231201024640.3417057-1-yonghong.song@linux.dev

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:29:05 +01:00
Jerome Marchand 500cd58618 bpf: make bpf_prog_pack allocator portable
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 20e490adea279d49d57b800475938f5b67926d98
Author: Puranjay Mohan <puranjay12@gmail.com>
Date:   Thu Aug 31 13:12:26 2023 +0000

    bpf: make bpf_prog_pack allocator portable

    The bpf_prog_pack allocator currently uses module_alloc() and
    module_memfree() to allocate and free memory. This is not portable
    because different architectures use different methods for allocating
    memory for BPF programs. Like ARM64 and riscv use vmalloc()/vfree().

    Use bpf_jit_alloc_exec() and bpf_jit_free_exec() for memory management
    in bpf_prog_pack allocator. Other architectures can override these with
    their implementation and will be able to use bpf_prog_pack directly.

    On architectures that don't override bpf_jit_alloc/free_exec() this is
    basically a NOP.

    Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
    Acked-by: Song Liu <song@kernel.org>
    Acked-by: Björn Töpel <bjorn@kernel.org>
    Tested-by: Björn Töpel <bjorn@rivosinc.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/r/20230831131229.497941-2-puranjay12@gmail.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:29:03 +01:00
Jerome Marchand 99a93b82b6 bpf: fix bpf_probe_read_kernel prototype mismatch
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 6a5a148aaf14747570cc634f9cdfcb0393f5617f
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Aug 1 13:13:58 2023 +0200

    bpf: fix bpf_probe_read_kernel prototype mismatch

    bpf_probe_read_kernel() has a __weak definition in core.c and another
    definition with an incompatible prototype in kernel/trace/bpf_trace.c,
    when CONFIG_BPF_EVENTS is enabled.

    Since the two are incompatible, there cannot be a shared declaration in
    a header file, but the lack of a prototype causes a W=1 warning:

    kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes]

    On 32-bit architectures, the local prototype

    u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)

    passes arguments in other registers as the one in bpf_trace.c

    BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
                const void *, unsafe_ptr)

    which uses 64-bit arguments in pairs of registers.

    As both versions of the function are fairly simple and only really
    differ in one line, just move them into a header file as an inline
    function that does not add any overhead for the bpf_trace.c callers
    and actually avoids a function call for the other one.

    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:55 +01:00
Jerome Marchand da0a7c5bc8 bpf: Fix compilation warning with -Wparentheses
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 09fedc731874123e0f6e5e5e3572db0c60378c2a
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 22:57:40 2023 -0700

    bpf: Fix compilation warning with -Wparentheses

    The kernel test robot reported compilation warnings when -Wparentheses is
    added to KBUILD_CFLAGS with gcc compiler. The following is the error message:

      .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_reg_to_size_sx’:
      .../bpf-next/kernel/bpf/verifier.c:5901:14:
        error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
        if (s64_max >= 0 == s64_min >= 0) {
            ~~~~~~~~^~~~
      .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_subreg_to_size_sx’:
      .../bpf-next/kernel/bpf/verifier.c:5965:14:
        error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
        if (s32_min >= 0 == s32_max >= 0) {
            ~~~~~~~~^~~~

    To fix the issue, add proper parentheses for the above '>=' condition
    to silence the warning/error.

    I tried a few clang compilers like clang16 and clang18 and they do not emit
    such warnings with -Wparentheses.

    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202307281133.wi0c4SqG-lkp@intel.com/
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230728055740.2284534-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:55 +01:00
Jerome Marchand d977d5ac53 bpf: Support new 32bit offset jmp instruction
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 4cd58e9af8b9d9fff6b7145e742abbfcda0af4af
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:12:31 2023 -0700

    bpf: Support new 32bit offset jmp instruction

    Add interpreter/jit/verifier support for 32bit offset jmp instruction.
    If a conditional jmp instruction needs more than 16bit offset,
    it can be simulated with a conditional jmp + a 32bit jmp insn.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:54 +01:00
Jerome Marchand 95188e6318 bpf: Fix jit blinding with new sdiv/smov insns
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 7058e3a31ee4b9240cccab5bc13c1afbfa3d16a0
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:12:25 2023 -0700

    bpf: Fix jit blinding with new sdiv/smov insns

    Handle new insns properly in bpf_jit_blind_insn() function.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:54 +01:00
Jerome Marchand d2c62fc1f7 bpf: Support new signed div/mod instructions.
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit ec0e2da95f72d4a46050a4d994e4fe471474fd80
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:12:19 2023 -0700

    bpf: Support new signed div/mod instructions.

    Add interpreter/jit support for new signed div/mod insns.
    The new signed div/mod instructions are encoded with
    unsigned div/mod instructions plus insn->off == 1.
    Also add basic verifier support to ensure new insns get
    accepted.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:54 +01:00
Jerome Marchand a15b86d2ea bpf: Support new unconditional bswap instruction
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 0845c3db7bf5c4ceb7100bcd8fd594d9ccf3c29a
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:12:13 2023 -0700

    bpf: Support new unconditional bswap instruction

    The existing 'be' and 'le' insns will do conditional bswap
    depends on host endianness. This patch implements
    unconditional bswap insns.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:54 +01:00
Jerome Marchand 6882a99d68 bpf: Support new sign-extension mov insns
JIRA: https://issues.redhat.com/browse/RHEL-10691

Conflicts: Context change from missing commit 291d044fd51f ("bpf: Fix
precision tracking for BPF_ALU | BPF_TO_BE | BPF_END")

commit 8100928c881482a73ed8bd499d602bab0fe55608
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:12:02 2023 -0700

    bpf: Support new sign-extension mov insns

    Add interpreter/jit support for new sign-extension mov insns.
    The original 'MOV' insn is extended to support reg-to-reg
    signed version for both ALU and ALU64 operations. For ALU mode,
    the insn->off value of 8 or 16 indicates sign-extension
    from 8- or 16-bit value to 32-bit value. For ALU64 mode,
    the insn->off value of 8/16/32 indicates sign-extension
    from 8-, 16- or 32-bit value to 64-bit value.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:53 +01:00
Jerome Marchand 1b94a94226 bpf: Support new sign-extension load insns
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 1f9a1ea821ff25353a0e80d971e7958cd55b47a3
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Jul 27 18:11:56 2023 -0700

    bpf: Support new sign-extension load insns

    Add interpreter/jit support for new sign-extension load insns
    which adds a new mode (BPF_MEMSX).
    Also add verifier support to recognize these insns and to
    do proper verification with new insns. In verifier, besides
    to deduce proper bounds for the dst_reg, probed memory access
    is also properly handled.

    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:53 +01:00
Viktor Malik 67d9643d84
bpf: Hide unused bpf_patch_call_args
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit ba49f976885869835a1783863376221dc24f1817
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Jun 2 15:50:18 2023 +0200

    bpf: Hide unused bpf_patch_call_args
    
    This function is only used when CONFIG_BPF_JIT_ALWAYS_ON is disabled, but
    CONFIG_BPF_SYSCALL is enabled. When both are turned off, the prototype is
    missing but the unused function is still compiled, as seen from this W=1
    warning:
    
      [...]
      kernel/bpf/core.c:2075:6: error: no previous prototype for 'bpf_patch_call_args' [-Werror=missing-prototypes]
      [...]
    
    Add a matching #ifdef for the definition to leave it out.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20230602135128.1498362-1-arnd@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-26 17:06:18 +02:00
Artem Savkov 54e4f459d5 bpf: Support 64-bit pointers to kfuncs
Bugzilla: https://bugzilla.redhat.com/2221599

Conflicts: already present dd16dc6e35 "x86/speculation: Include
unprivileged eBPF status in Spectre v2 mitigation reporting"

commit 1cf3bfc60f9836f44da951f58b6ae24680484b35
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Thu Apr 13 01:06:32 2023 +0200

    bpf: Support 64-bit pointers to kfuncs

    test_ksyms_module fails to emit a kfunc call targeting a module on
    s390x, because the verifier stores the difference between kfunc
    address and __bpf_call_base in bpf_insn.imm, which is s32, and modules
    are roughly (1 << 42) bytes away from the kernel on s390x.

    Fix by keeping BTF id in bpf_insn.imm for BPF_PSEUDO_KFUNC_CALLs,
    and storing the absolute address in bpf_kfunc_desc.

    Introduce bpf_jit_supports_far_kfunc_call() in order to limit this new
    behavior to the s390x JIT. Otherwise other JITs need to be modified,
    which is not desired.

    Introduce bpf_get_kfunc_addr() instead of exposing both
    find_kfunc_desc() and struct bpf_kfunc_desc.

    In addition to sorting kfuncs by imm, also sort them by offset, in
    order to handle conflicting imms from different modules. Do this on
    all architectures in order to simplify code.

    Factor out resolving specialized kfuncs (XPD and dynptr) from
    fixup_kfunc_call(). This was required in the first place, because
    fixup_kfunc_call() uses find_kfunc_desc(), which returns a const
    pointer, so it's not possible to modify kfunc addr without stripping
    const, which is not nice. It also removes repetition of code like:

    	if (bpf_jit_supports_far_kfunc_call())
    		desc->addr = func;
    	else
    		insn->imm = BPF_CALL_IMM(func);

    and separates kfunc_desc_tab fixups from kfunc_call fixups.

    Suggested-by: Jiri Olsa <olsajiri@gmail.com>
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230412230632.885985-1-iii@linux.ibm.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:30 +02:00
Viktor Malik 71c39bec1d bpf: Adjust insufficient default bpf_jit_limit
Bugzilla: https://bugzilla.redhat.com/2178930

commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon Mar 20 15:37:25 2023 +0100

    bpf: Adjust insufficient default bpf_jit_limit
    
    We've seen recent AWS EKS (Kubernetes) user reports like the following:
    
      After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
      clusters after a few days a number of the nodes have containers stuck
      in ContainerCreating state or liveness/readiness probes reporting the
      following error:
    
        Readiness probe errored: rpc error: code = Unknown desc = failed to
        exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
        OCI runtime exec failed: exec failed: unable to start container process:
        unable to init seccomp: error loading seccomp filter into kernel:
        error loading seccomp filter: errno 524: unknown
    
      However, we had not been seeing this issue on previous AMIs and it only
      started to occur on v20230217 (following the upgrade from kernel 5.4 to
      5.10) with no other changes to the underlying cluster or workloads.
    
      We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
      which helped to immediately allow containers to be created and probes to
      execute but after approximately a day the issue returned and the value
      returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
      was steadily increasing.
    
    I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
    their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
    seccomp BPF and native (e)BPF programs, and the behavior all looks sane
    and expected, that is nothing "leaking" from an upstream perspective.
    
    The bpf_jit_limit knob was originally added in order to avoid a situation
    where unprivileged applications loading BPF programs (e.g. seccomp BPF
    policies) consuming all the module memory space via BPF JIT such that loading
    of kernel modules would be prevented. The default limit was defined back in
    2018 and while good enough back then, we are generally seeing far more BPF
    consumers today.
    
    Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
    module memory space to better reflect today's needs and avoid more users
    running into potentially hard to debug issues.
    
    Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
    Reported-by: Stephen Haynes <sh@synk.net>
    Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
    Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:46 +02:00
Viktor Malik d2394048ad bpf: add missing header file include
Bugzilla: https://bugzilla.redhat.com/2178930

commit f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Feb 22 09:52:32 2023 -0800

    bpf: add missing header file include
    
    Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to
    copy_from_user()") built fine on x86-64 and arm64, and that's the extent
    of my local build testing.
    
    It turns out those got the <linux/nospec.h> include incidentally through
    other header files (<linux/kvm_host.h> in particular), but that was not
    true of other architectures, resulting in build errors
    
      kernel/bpf/core.c: In function ‘___bpf_prog_run’:
      kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’
    
    so just make sure to explicitly include the proper <linux/nospec.h>
    header file to make everybody see it.
    
    Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()")
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
    Reported-by: Huacai Chen <chenhuacai@loongson.cn>
    Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
    Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:41 +02:00