Commit Graph

355 Commits

Author SHA1 Message Date
Jerome Marchand 409d026142 bpf: Add bpf_get_func_ip helper support for uprobe link
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 686328d80c4346329d37a838021fa6b7d5ca64ec
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Aug 9 10:34:18 2023 +0200

    bpf: Add bpf_get_func_ip helper support for uprobe link

    Adding support for bpf_get_func_ip helper being called from
    ebpf program attached by uprobe_multi link.

    It returns the ip of the uprobe.

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230809083440.3209381-7-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:59 +01:00
Jerome Marchand b5f84f6a73 bpf: Add pid filter support for uprobe_multi link
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit b733eeade4204423711793595c3c8d78a2fa8b2e
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Aug 9 10:34:17 2023 +0200

    bpf: Add pid filter support for uprobe_multi link

    Adding support to specify pid for uprobe_multi link and the uprobes
    are created only for task with given pid value.

    Using the consumer.filter filter callback for that, so the task gets
    filtered during the uprobe installation.

    We still need to check the task during runtime in the uprobe handler,
    because the handler could get executed if there's another system
    wide consumer on the same uprobe (thanks Oleg for the insight).

    Cc: Oleg Nesterov <oleg@redhat.com>
    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:59 +01:00
Jerome Marchand 5e4ac3ed41 bpf: Add cookies support for uprobe_multi link
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 0b779b61f651851df5c5c42938a6c441eb1b5100
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Aug 9 10:34:16 2023 +0200

    bpf: Add cookies support for uprobe_multi link

    Adding support to specify cookies array for uprobe_multi link.

    The cookies array share indexes and length with other uprobe_multi
    arrays (offsets/ref_ctr_offsets).

    The cookies[i] value defines cookie for i-the uprobe and will be
    returned by bpf_get_attach_cookie helper when called from ebpf
    program hooked to that specific uprobe.

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:59 +01:00
Jerome Marchand bfe606d61a bpf: Add multi uprobe link
JIRA: https://issues.redhat.com/browse/RHEL-10691

Conflicts: Context change from missing commit 91721c2d02d3
("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter
link")

commit 89ae89f53d201143560f1e9ed4bfa62eee34f88e
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Aug 9 10:34:15 2023 +0200

    bpf: Add multi uprobe link

    Adding new multi uprobe link that allows to attach bpf program
    to multiple uprobes.

    Uprobes to attach are specified via new link_create uprobe_multi
    union:

      struct {
        __aligned_u64   path;
        __aligned_u64   offsets;
        __aligned_u64   ref_ctr_offsets;
        __u32           cnt;
        __u32           flags;
      } uprobe_multi;

    Uprobes are defined for single binary specified in path and multiple
    calling sites specified in offsets array with optional reference
    counters specified in ref_ctr_offsets array. All specified arrays
    have length of 'cnt'.

    The 'flags' supports single bit for now that marks the uprobe as
    return probe.

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:59 +01:00
Jerome Marchand 9b6b5ffa34 bpf: Add support for bpf_get_func_ip helper for uprobe program
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit a3c485a5d8d47af5d2d1a0e5c3b7a1ed223669f9
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Aug 7 10:59:54 2023 +0200

    bpf: Add support for bpf_get_func_ip helper for uprobe program

    Adding support for bpf_get_func_ip helper for uprobe program to return
    probed address for both uprobe and return uprobe.

    We discussed this in [1] and agreed that uprobe can have special use
    of bpf_get_func_ip helper that differs from kprobe.

    The kprobe bpf_get_func_ip returns:
      - address of the function if probe is attach on function entry
        for both kprobe and return kprobe
      - 0 if the probe is not attach on function entry

    The uprobe bpf_get_func_ip returns:
      - address of the probe for both uprobe and return uprobe

    The reason for this semantic change is that kernel can't really tell
    if the probe user space address is function entry.

    The uprobe program is actually kprobe type program attached as uprobe.
    One of the consequences of this design is that uprobes do not have its
    own set of helpers, but share them with kprobes.

    As we need different functionality for bpf_get_func_ip helper for uprobe,
    I'm adding the bool value to the bpf_trace_run_ctx, so the helper can
    detect that it's executed in uprobe context and call specific code.

    The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which
    is currently used only for executing bpf programs in uprobe.

    Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe
    to address that it's only used for uprobes and that it sets the
    run_ctx.is_uprobe as suggested by Yafang Shao.

    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Tested-by: Alan Maguire <alan.maguire@oracle.com>
    [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Tested-by: Viktor Malik <vmalik@redhat.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:56 +01:00
Jerome Marchand 99a93b82b6 bpf: fix bpf_probe_read_kernel prototype mismatch
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 6a5a148aaf14747570cc634f9cdfcb0393f5617f
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Aug 1 13:13:58 2023 +0200

    bpf: fix bpf_probe_read_kernel prototype mismatch

    bpf_probe_read_kernel() has a __weak definition in core.c and another
    definition with an incompatible prototype in kernel/trace/bpf_trace.c,
    when CONFIG_BPF_EVENTS is enabled.

    Since the two are incompatible, there cannot be a shared declaration in
    a header file, but the lack of a prototype causes a W=1 warning:

    kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes]

    On 32-bit architectures, the local prototype

    u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)

    passes arguments in other registers as the one in bpf_trace.c

    BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
                const void *, unsafe_ptr)

    which uses 64-bit arguments in pairs of registers.

    As both versions of the function are fairly simple and only really
    differ in one line, just move them into a header file as an inline
    function that does not add any overhead for the bpf_trace.c callers
    and actually avoids a function call for the other one.

    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-15 09:28:55 +01:00
Jerome Marchand 2e128f3948 bpf: Support ->fill_link_info for perf_event
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 1b715e1b0ec531fae72cd6698fe1c98affa436f8
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Jul 9 02:56:28 2023 +0000

    bpf: Support ->fill_link_info for perf_event

    By introducing support for ->fill_link_info to the perf_event link, users
    gain the ability to inspect it using `bpftool link show`. While the current
    approach involves accessing this information via `bpftool perf show`,
    consolidating link information for all link types in one place offers
    greater convenience. Additionally, this patch extends support to the
    generic perf event, which is not currently accommodated by
    `bpftool perf show`. While only the perf type and config are exposed to
    userspace, other attributes such as sample_period and sample_freq are
    ignored. It's important to note that if kptr_restrict is not permitted, the
    probed address will not be exposed, maintaining security measures.

    A new enum bpf_perf_event_type is introduced to help the user understand
    which struct is relevant.

    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-14 15:22:25 +01:00
Jerome Marchand d586f70039 bpf: Clear the probe_addr for uprobe
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 5125e757e62f6c1d5478db4c2b61a744060ddf3f
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Jul 9 02:56:25 2023 +0000

    bpf: Clear the probe_addr for uprobe

    To avoid returning uninitialized or random values when querying the file
    descriptor (fd) and accessing probe_addr, it is necessary to clear the
    variable prior to its use.

    Fixes: 41bdc4b40e ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-14 15:22:25 +01:00
Jerome Marchand f802faa9cd bpf: Support ->fill_link_info for kprobe_multi
JIRA: https://issues.redhat.com/browse/RHEL-10691

commit 7ac8d0d2619256cc13eaf4a889b3177a1607b02d
Author: Yafang Shao <laoar.shao@gmail.com>
Date:   Sun Jul 9 02:56:21 2023 +0000

    bpf: Support ->fill_link_info for kprobe_multi

    With the addition of support for fill_link_info to the kprobe_multi link,
    users will gain the ability to inspect it conveniently using the
    `bpftool link show`. This enhancement provides valuable information to the
    user, including the count of probed functions and their respective
    addresses. It's important to note that if the kptr_restrict setting is not
    permitted, the probed address will not be exposed, ensuring security.

    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-12-14 15:22:25 +01:00
Viktor Malik e2a5a2ab00
bpf: Disable preemption in bpf_event_output
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit d62cc390c2e99ae267ffe4b8d7e2e08b6c758c32
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Jul 25 10:42:06 2023 +0200

    bpf: Disable preemption in bpf_event_output
    
    We received report [1] of kernel crash, which is caused by
    using nesting protection without disabled preemption.
    
    The bpf_event_output can be called by programs executed by
    bpf_prog_run_array_cg function that disabled migration but
    keeps preemption enabled.
    
    This can cause task to be preempted by another one inside the
    nesting protection and lead eventually to two tasks using same
    perf_sample_data buffer and cause crashes like:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000001
      #PF: supervisor instruction fetch in kernel mode
      #PF: error_code(0x0010) - not-present page
      ...
      ? perf_output_sample+0x12a/0x9a0
      ? finish_task_switch.isra.0+0x81/0x280
      ? perf_event_output+0x66/0xa0
      ? bpf_event_output+0x13a/0x190
      ? bpf_event_output_data+0x22/0x40
      ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
      ? xa_load+0x87/0xe0
      ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
      ? release_sock+0x3e/0x90
      ? sk_setsockopt+0x1a1/0x12f0
      ? udp_pre_connect+0x36/0x50
      ? inet_dgram_connect+0x93/0xa0
      ? __sys_connect+0xb4/0xe0
      ? udp_setsockopt+0x27/0x40
      ? __pfx_udp_push_pending_frames+0x10/0x10
      ? __sys_setsockopt+0xdf/0x1a0
      ? __x64_sys_connect+0xf/0x20
      ? do_syscall_64+0x3a/0x90
      ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
    
    Fixing this by disabling preemption in bpf_event_output.
    
    [1] https://github.com/cilium/cilium/issues/26756
    Cc: stable@vger.kernel.org
    Reported-by: Oleg "livelace" Popov <o.popov@livelace.ru>
    Closes: https://github.com/cilium/cilium/issues/26756
    Fixes: 2a916f2f54 ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
    Acked-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-26 17:06:22 +02:00
Viktor Malik 7eec207e6d
bpf: Disable preemption in bpf_perf_event_output
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit f2c67a3e60d1071b65848efaa8c3b66c363dd025
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Jul 25 10:42:05 2023 +0200

    bpf: Disable preemption in bpf_perf_event_output
    
    The nesting protection in bpf_perf_event_output relies on disabled
    preemption, which is guaranteed for kprobes and tracepoints.
    
    However bpf_perf_event_output can be also called from uprobes context
    through bpf_prog_run_array_sleepable function which disables migration,
    but keeps preemption enabled.
    
    This can cause task to be preempted by another one inside the nesting
    protection and lead eventually to two tasks using same perf_sample_data
    buffer and cause crashes like:
    
      kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
      BUG: unable to handle page fault for address: ffffffff82be3eea
      ...
      Call Trace:
       ? __die+0x1f/0x70
       ? page_fault_oops+0x176/0x4d0
       ? exc_page_fault+0x132/0x230
       ? asm_exc_page_fault+0x22/0x30
       ? perf_output_sample+0x12b/0x910
       ? perf_event_output+0xd0/0x1d0
       ? bpf_perf_event_output+0x162/0x1d0
       ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
       ? __uprobe_perf_func+0x12b/0x540
       ? uprobe_dispatcher+0x2c4/0x430
       ? uprobe_notify_resume+0x2da/0xce0
       ? atomic_notifier_call_chain+0x7b/0x110
       ? exit_to_user_mode_prepare+0x13e/0x290
       ? irqentry_exit_to_user_mode+0x5/0x30
       ? asm_exc_int3+0x35/0x40
    
    Fixing this by disabling preemption in bpf_perf_event_output.
    
    Cc: stable@vger.kernel.org
    Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
    Acked-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-26 17:06:22 +02:00
Viktor Malik effda92542 bpf: Add bpf_dynptr_size
JIRA: https://issues.redhat.com/browse/RHEL-9957

commit 26662d7347a058ca497792c4b22ac91cc415cbf6
Author: Joanne Koong <joannelkoong@gmail.com>
Date:   Thu Apr 20 00:14:12 2023 -0700

    bpf: Add bpf_dynptr_size
    
    bpf_dynptr_size returns the number of usable bytes in a dynptr.
    
    Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20230420071414.570108-4-joannelkoong@gmail.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-10-11 12:51:42 +02:00
Artem Savkov a24c4e711b bpf: Add extra path pointer check to d_path helper
Bugzilla: https://bugzilla.redhat.com/2221599

commit f46fab0e36e611a2389d3843f34658c849b6bd60
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Jun 6 11:17:14 2023 -0700

    bpf: Add extra path pointer check to d_path helper
    
    Anastasios reported crash on stable 5.15 kernel with following
    BPF attached to lsm hook:
    
      SEC("lsm.s/bprm_creds_for_exec")
      int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
      {
              struct path *path = &bprm->executable->f_path;
              char p[128] = { 0 };
    
              bpf_d_path(path, p, 128);
              return 0;
      }
    
    But bprm->executable can be NULL, so bpf_d_path call will crash:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000018
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
      ...
      RIP: 0010:d_path+0x22/0x280
      ...
      Call Trace:
       <TASK>
       bpf_d_path+0x21/0x60
       bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
       bpf_trampoline_6442506293_0+0x55/0x1000
       bpf_lsm_bprm_creds_for_exec+0x5/0x10
       security_bprm_creds_for_exec+0x29/0x40
       bprm_execve+0x1c1/0x900
       do_execveat_common.isra.0+0x1af/0x260
       __x64_sys_execve+0x32/0x40
    
    It's problem for all stable trees with bpf_d_path helper, which was
    added in 5.9.
    
    This issue is fixed in current bpf code, where we identify and mark
    trusted pointers, so the above code would fail even to load.
    
    For the sake of the stable trees and to workaround potentially broken
    verifier in the future, adding the code that reads the path object from
    the passed pointer and verifies it's valid in kernel space.
    
    Fixes: 6e22ab9da7 ("bpf: Add d_path helper")
    Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:36 +02:00
Artem Savkov 74f2bb6c6c bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
Bugzilla: https://bugzilla.redhat.com/2221599

commit c501bf55c88b834adefda870c7c092ec9052a437
Author: Tejun Heo <tj@kernel.org>
Date:   Thu Mar 2 09:42:59 2023 -1000

    bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
    
    These helpers are safe to call from any context and there's no reason to
    restrict access to them. Remove them from bpf_trace and filter lists and add
    to bpf_base_func_proto() under perfmon_capable().
    
    v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that
        they require bpf_capable() but not perfomon_capable() as it doesn't read
        from or affect others on the system.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-09-22 09:12:10 +02:00
Jan Stancek e341c7e709 Merge: bpf, xdp: update to 6.3
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2583

Rebase bpf and xdp to 6.3.

Bugzilla: https://bugzilla.redhat.com/2178930

Signed-off-by: Viktor Malik <vmalik@redhat.com>

Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Artem Savkov <asavkov@redhat.com>
Approved-by: Jason Wang <jasowang@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Baoquan He <5820488-baoquan_he@users.noreply.gitlab.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-06-28 07:52:45 +02:00
Michael Petlan a94aae2cd6 perf/core: Add perf_sample_save_raw_data() helper
Bugzilla: https://bugzilla.redhat.com/2177183

upstream
========
commit 0a9081cf0a11770f6b0affd377db8caa3ec4c793
Author: Namhyung Kim <namhyung@kernel.org>
Date: Tue Jan 17 22:05:54 2023 -0800

description
===========
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size.  To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.

    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
2023-06-14 12:23:17 +02:00
Viktor Malik 23c9904275 bpf: Add __bpf_kfunc tag to all kfuncs
Bugzilla: https://bugzilla.redhat.com/2178930

commit 400031e05adfcef9e80eca80bdfc3f4b63658be4
Author: David Vernet <void@manifault.com>
Date:   Wed Feb 1 11:30:15 2023 -0600

    bpf: Add __bpf_kfunc tag to all kfuncs

    Now that we have the __bpf_kfunc tag, we should use add it to all
    existing kfuncs to ensure that they'll never be elided in LTO builds.

    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:20 +02:00
Viktor Malik d91051c0e7 bpf: Change modules resolving for kprobe multi link
Bugzilla: https://bugzilla.redhat.com/2178930

Conflicts: conflict due to not backported 07cc2c931e8e
           ("livepatch: Improve the search performance of
           module_kallsyms_on_each_symbol()").

commit 6a5f2d6ee8d515d5912e33d63a7386d03854a655
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Jan 16 11:10:09 2023 +0100

    bpf: Change modules resolving for kprobe multi link

    We currently use module_kallsyms_on_each_symbol that iterates all
    modules/symbols and we try to lookup each such address in user
    provided symbols/addresses to get list of used modules.

    This fix instead only iterates provided kprobe addresses and calls
    __module_address on each to get list of used modules. This turned
    out to be simpler and also bit faster.

    On my setup with workload (executed 10 times):

       # test_progs -t kprobe_multi_bench_attach/modules

    Current code:

     Performance counter stats for './test.sh' (5 runs):

        76,081,161,596      cycles:k                   ( +-  0.47% )

               18.3867 +- 0.0992 seconds time elapsed  ( +-  0.54% )

    With the fix:

     Performance counter stats for './test.sh' (5 runs):

        74,079,889,063      cycles:k                   ( +-  0.04% )

               17.8514 +- 0.0218 seconds time elapsed  ( +-  0.12% )

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
    Reviewed-by: Petr Mladek <pmladek@suse.com>
    Link: https://lore.kernel.org/r/20230116101009.23694-4-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:40 +02:00
Viktor Malik 1876dbfb9e bpf: Remove trace_printk_lock
Bugzilla: https://bugzilla.redhat.com/2178930

commit e2bb9e01d589f7fa82573aedd2765ff9b277816a
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:30 2022 +0100

    bpf: Remove trace_printk_lock
    
    Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded
    with trace_printk_lock spin lock.
    
    The spin lock contention causes issues with bpf programs attached to
    contention_begin tracepoint [1][2].
    
    Andrii suggested we could get rid of the contention by using trylock, but we
    could actually get rid of the spinlock completely by using percpu buffers the
    same way as for bin_args in bpf_bprintf_prepare function.
    
    Adding new return 'buf' argument to struct bpf_bprintf_data and making
    bpf_bprintf_prepare to return also the buffer for printk helpers.
    
      [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
      [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
    
    Reported-by: Hao Sun <sunhao.th@gmail.com>
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-4-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:23 +02:00
Viktor Malik 5741f9f020 bpf: Do cleanup in bpf_bprintf_cleanup only when needed
Bugzilla: https://bugzilla.redhat.com/2178930

commit f19a4050455aad847fb93f18dc1fe502eb60f989
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:29 2022 +0100

    bpf: Do cleanup in bpf_bprintf_cleanup only when needed
    
    Currently we always cleanup/decrement bpf_bprintf_nest_level variable
    in bpf_bprintf_cleanup if it's > 0.
    
    There's possible scenario where this could cause a problem, when
    bpf_bprintf_prepare does not get bin_args buffer (because num_args is 0)
    and following bpf_bprintf_cleanup call decrements bpf_bprintf_nest_level
    variable, like:
    
      in task context:
        bpf_bprintf_prepare(num_args != 0) increments 'bpf_bprintf_nest_level = 1'
        -> first irq :
           bpf_bprintf_prepare(num_args == 0)
           bpf_bprintf_cleanup decrements 'bpf_bprintf_nest_level = 0'
        -> second irq:
           bpf_bprintf_prepare(num_args != 0) bpf_bprintf_nest_level = 1
           gets same buffer as task context above
    
    Adding check to bpf_bprintf_cleanup and doing the real cleanup only if we
    got bin_args data in the first place.
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-3-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:23 +02:00
Viktor Malik e1af8144ba bpf: Add struct for bin_args arg in bpf_bprintf_prepare
Bugzilla: https://bugzilla.redhat.com/2178930

commit 78aa1cc9404399a15d2a1205329c6a06236f5378
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Thu Dec 15 22:44:28 2022 +0100

    bpf: Add struct for bin_args arg in bpf_bprintf_prepare
    
    Adding struct bpf_bprintf_data to hold bin_args argument for
    bpf_bprintf_prepare function.
    
    We will add another return argument to bpf_bprintf_prepare and
    pass the struct to bpf_bprintf_cleanup for proper cleanup in
    following changes.
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20221215214430.1336195-2-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:44:22 +02:00
Michael Petlan 353e95d57c bpf: Fix sample_flags for bpf_perf_event_output
Bugzilla: https://bugzilla.redhat.com/2177180

upstream
========
commit 21da7472a040420f2dc624ffec70291a72c5d6a6
Author: Sumanth Korikkar <sumanthk@linux.ibm.com>
Date: Fri Oct 7 10:13:27 2022 +0200

description
===========
* Raw data is also filled by bpf_perf_event_output.
* Add sample_flags to indicate raw data.
* This eliminates the segfaults as shown below:
  Run ./samples/bpf/trace_output
  BUG pid 9 cookie 1001000000004 sized 4
  BUG pid 9 cookie 1001000000004 sized 4
  BUG pid 9 cookie 1001000000004 sized 4
  Segmentation fault (core dumped)

Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data")
    Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Namhyung Kim <namhyung@kernel.org>
    Link: https://lkml.kernel.org/r/20221007081327.1047552-1-sumanthk@linux.ibm.com

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
2023-06-05 10:02:57 +02:00
Michael Petlan f61617722f bpf: Check flags for branch stack in bpf_read_branch_records helper
Bugzilla: https://bugzilla.redhat.com/2177180

upstream
========
commit cce6a2d7e0e494c453ad73e1e78bd50684f20cca
Author: Jiri Olsa <jolsa@kernel.org>
Date: Tue Sep 27 22:32:59 2022 +0200

description
===========
Recent commit [1] changed branch stack data indication from
br_stack pointer to sample_flags in perf_sample_data struct.

We need to check sample_flags for PERF_SAMPLE_BRANCH_STACK
bit for valid branch stack data.

[1] a9a931e26668 ("perf: Use sample_flags for branch stack")

Fixes: a9a931e26668 ("perf: Use sample_flags for branch stack")
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
    Link: https://lore.kernel.org/r/20220927203259.590950-1-jolsa@kernel.org

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
2023-06-05 10:02:55 +02:00
Jerome Marchand 3dcfdcacd5 bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
Bugzilla: https://bugzilla.redhat.com/2177177

commit bdb7fdb0aca8b96cef9995d3a57e251c2289322f
Author: Yonghong Song <yhs@fb.com>
Date:   Wed Jan 18 12:48:15 2023 -0800

    bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers

    In current bpf_send_signal() and bpf_send_signal_thread() helper
    implementation, irq_work is used to handle nmi context. Hao Sun
    reported in [1] that the current task at the entry of the helper
    might be gone during irq_work callback processing. To fix the issue,
    a reference is acquired for the current task before enqueuing into
    the irq_work so that the queued task is still available during
    irq_work callback processing.

      [1] https://lore.kernel.org/bpf/20230109074425.12556-1-sunhao.th@gmail.com/

    Fixes: 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
    Tested-by: Hao Sun <sunhao.th@gmail.com>
    Reported-by: Hao Sun <sunhao.th@gmail.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20230118204815.3331855-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:20 +02:00
Jerome Marchand fb3144136a bpf: Skip task with pid=1 in send_signal_common()
Bugzilla: https://bugzilla.redhat.com/2177177

commit a3d81bc1eaef48e34dd0b9b48eefed9e02a06451
Author: Hao Sun <sunhao.th@gmail.com>
Date:   Fri Jan 6 16:48:38 2023 +0800

    bpf: Skip task with pid=1 in send_signal_common()

    The following kernel panic can be triggered when a task with pid=1 attaches
    a prog that attempts to send killing signal to itself, also see [1] for more
    details:

      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
      panic+0x2c4/0x60f kernel/panic.c:275
      do_exit.cold+0x63/0xe4 kernel/exit.c:789
      do_group_exit+0xd4/0x2a0 kernel/exit.c:950
      get_signal+0x2460/0x2600 kernel/signal.c:2858
      arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306
      exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
      exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
      __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
      syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
      do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
      entry_SYSCALL_64_after_hwframe+0x63/0xcd

    So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.

      [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com

    Signed-off-by: Hao Sun <sunhao.th@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:20 +02:00
Jerome Marchand 868564cc57 bpf: Introduce might_sleep field in bpf_func_proto
Bugzilla: https://bugzilla.redhat.com/2177177

commit 01685c5bddaa6df3d662c8afed5e5289fcc68e5a
Author: Yonghong Song <yhs@fb.com>
Date:   Wed Nov 23 21:32:11 2022 -0800

    bpf: Introduce might_sleep field in bpf_func_proto

    Introduce bpf_func_proto->might_sleep to indicate a particular helper
    might sleep. This will make later check whether a helper might be
    sleepable or not easier.

    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221124053211.2373553-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:12 +02:00
Jerome Marchand a52cc75452 bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
Bugzilla: https://bugzilla.redhat.com/2177177

commit 3f00c52393445ed49aadc1a567aa502c6333b1a1
Author: David Vernet <void@manifault.com>
Date:   Sat Nov 19 23:10:02 2022 -0600

    bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs

    Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
    to the verifier that it should enforce that a BPF program passes it a
    "safe", trusted pointer. Currently, "safe" means that the pointer is
    either PTR_TO_CTX, or is refcounted. There may be cases, however, where
    the kernel passes a BPF program a safe / trusted pointer to an object
    that the BPF program wishes to use as a kptr, but because the object
    does not yet have a ref_obj_id from the perspective of the verifier, the
    program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
    kfunc.

    The solution is to expand the set of pointers that are considered
    trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
    with these pointers without getting rejected by the verifier.

    There is already a PTR_UNTRUSTED flag that is set in some scenarios,
    such as when a BPF program reads a kptr directly from a map
    without performing a bpf_kptr_xchg() call. These pointers of course can
    and should be rejected by the verifier. Unfortunately, however,
    PTR_UNTRUSTED does not cover all the cases for safety that need to
    be addressed to adequately protect kfuncs. Specifically, pointers
    obtained by a BPF program "walking" a struct are _not_ considered
    PTR_UNTRUSTED according to BPF. For example, say that we were to add a
    kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
    acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
    that a task was unsafe to pass to a kfunc, the verifier would mistakenly
    allow the following unsafe BPF program to be loaded:

    SEC("tp_btf/task_newtask")
    int BPF_PROG(unsafe_acquire_task,
                 struct task_struct *task,
                 u64 clone_flags)
    {
            struct task_struct *acquired, *nested;

            nested = task->last_wakee;

            /* Would not be rejected by the verifier. */
            acquired = bpf_task_acquire(nested);
            if (!acquired)
                    return 0;

            bpf_task_release(acquired);
            return 0;
    }

    To address this, this patch defines a new type flag called PTR_TRUSTED
    which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
    KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
    passed directly from the kernel as a tracepoint or struct_ops callback
    argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
    pointer is no longer PTR_TRUSTED. From the example above, the struct
    task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
    obtained from 'task->last_wakee' is not PTR_TRUSTED.

    A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
    and then another patch will add selftests to validate.

    Signed-off-by: David Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:43:08 +02:00
Jerome Marchand dcf538d57d bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Bugzilla: https://bugzilla.redhat.com/2177177

Conflicts: Context change from missing commit 7f203bc89eb6 ("cgroup:
Replace cgroup->ancestor_ids[] with ->ancestors[]")

commit c4bcfb38a95edb1021a53f2d0356a78120ecfbe4
Author: Yonghong Song <yhs@fb.com>
Date:   Tue Oct 25 21:28:50 2022 -0700

    bpf: Implement cgroup storage available to non-cgroup-attached bpf progs

    Similar to sk/inode/task storage, implement similar cgroup local storage.

    There already exists a local storage implementation for cgroup-attached
    bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
    bpf_get_local_storage(). But there are use cases such that non-cgroup
    attached bpf progs wants to access cgroup local storage data. For example,
    tc egress prog has access to sk and cgroup. It is possible to use
    sk local storage to emulate cgroup local storage by storing data in socket.
    But this is a waste as it could be lots of sockets belonging to a particular
    cgroup. Alternatively, a separate map can be created with cgroup id as the key.
    But this will introduce additional overhead to manipulate the new map.
    A cgroup local storage, similar to existing sk/inode/task storage,
    should help for this use case.

    The life-cycle of storage is managed with the life-cycle of the
    cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
    with a call to bpf_cgrp_storage_free() when cgroup itself
    is deleted.

    The userspace map operations can be done by using a cgroup fd as a key
    passed to the lookup, update and delete operations.

    Typically, the following code is used to get the current cgroup:
        struct task_struct *task = bpf_get_current_task_btf();
        ... task->cgroups->dfl_cgrp ...
    and in structure task_struct definition:
        struct task_struct {
            ....
            struct css_set __rcu            *cgroups;
            ....
        }
    With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
    So the current implementation only supports non-sleepable program and supporting
    sleepable program will be the next step together with adding rcu_read_lock
    protection for rcu tagged structures.

    Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
    storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
    for cgroup storage available to non-cgroup-attached bpf programs. The old
    cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
    The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
    functionality. While old cgroup storage pre-allocates storage memory, the new
    mechanism can also pre-allocate with a user space bpf_map_update_elem() call
    to avoid potential run-time memory allocation failure.
    Therefore, the new cgroup storage can provide all functionality w.r.t.
    the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
    BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
    be deprecated since the new one can provide the same functionality.

    Acked-by: David Vernet <void@manifault.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:58 +02:00
Jerome Marchand 7474c8a3a8 bpf: Add new bpf_task_storage_delete proto with no deadlock detection
Bugzilla: https://bugzilla.redhat.com/2177177

commit 8a7dac37f27a3dfbd814bf29a73d6417db2c81d9
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Tue Oct 25 11:45:22 2022 -0700

    bpf: Add new bpf_task_storage_delete proto with no deadlock detection

    The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
    The situation is similar to the bpf_pid_task_storage_delete_elem()
    which is called from the syscall map_delete_elem.  It does not need
    deadlock detection.  Otherwise, it will cause unnecessary failure
    when calling the bpf_task_storage_delete() helper.

    This patch adds bpf_task_storage_delete proto that does not do deadlock
    detection.  It will be used by bpf_lsm and bpf_iter program.

    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20221025184524.3526117-8-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:57 +02:00
Jerome Marchand b6c28ff998 bpf: Add new bpf_task_storage_get proto with no deadlock detection
Bugzilla: https://bugzilla.redhat.com/2177177

commit 4279adb094a17132423f1271c3d11b593fc2327e
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Tue Oct 25 11:45:20 2022 -0700

    bpf: Add new bpf_task_storage_get proto with no deadlock detection

    The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
    The situation is similar to the bpf_pid_task_storage_lookup_elem()
    which is called from the syscall map_lookup_elem.  It does not need
    deadlock detection.  Otherwise, it will cause unnecessary failure
    when calling the bpf_task_storage_get() helper.

    This patch adds bpf_task_storage_get proto that does not do deadlock
    detection.  It will be used by bpf_lsm and bpf_iter programs.

    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20221025184524.3526117-6-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:57 +02:00
Jerome Marchand 281950baf6 bpf: Append _recur naming to the bpf_task_storage helper proto
Bugzilla: https://bugzilla.redhat.com/2177177

commit 0593dd34e53489557569d5e6d27371b49aa9b41f
Author: Martin KaFai Lau <martin.lau@kernel.org>
Date:   Tue Oct 25 11:45:17 2022 -0700

    bpf: Append _recur naming to the bpf_task_storage helper proto

    This patch adds the "_recur" naming to the bpf_task_storage_{get,delete}
    proto.  In a latter patch, they will only be used by the tracing
    programs that requires a deadlock detection because a tracing
    prog may use bpf_task_storage_{get,delete} recursively and cause a
    deadlock.

    Another following patch will add a different helper proto for the non
    tracing programs because they do not need the deadlock prevention.
    This patch does this rename to prepare for this future proto
    additions.

    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Link: https://lore.kernel.org/r/20221025184524.3526117-3-martin.lau@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:57 +02:00
Jerome Marchand 1af3666226 bpf: Take module reference on kprobe_multi link
Bugzilla: https://bugzilla.redhat.com/2177177

commit e22061b2d3095c12f90336479f24bf5eeb70e1bd
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Oct 25 15:41:44 2022 +0200

    bpf: Take module reference on kprobe_multi link

    Currently we allow to create kprobe multi link on function from kernel
    module, but we don't take the module reference to ensure it's not
    unloaded while we are tracing it.

    The multi kprobe link is based on fprobe/ftrace layer which takes
    different approach and releases ftrace hooks when module is unloaded
    even if there's tracer registered on top of it.

    Adding code that gathers all the related modules for the link and takes
    their references before it's attached. All kernel module references are
    released after link is unregistered.

    Note that we do it the same way already for trampoline probes
    (but for single address).

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20221025134148.3300700-5-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:56 +02:00
Jerome Marchand 68727b1864 bpf: Rename __bpf_kprobe_multi_cookie_cmp to bpf_kprobe_multi_addrs_cmp
Bugzilla: https://bugzilla.redhat.com/2177177

commit 1a1b0716d36d21f8448bd7d3f1c0ade7230bb294
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Oct 25 15:41:43 2022 +0200

    bpf: Rename __bpf_kprobe_multi_cookie_cmp to bpf_kprobe_multi_addrs_cmp

    Renaming __bpf_kprobe_multi_cookie_cmp to bpf_kprobe_multi_addrs_cmp,
    because it's more suitable to current and upcoming code.

    Acked-by: Song Liu <song@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20221025134148.3300700-4-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-04-28 11:42:56 +02:00
Artem Savkov fee78f87aa bpf: Prevent bpf program recursion for raw tracepoint probes
Bugzilla: https://bugzilla.redhat.com/2166911

commit 05b24ff9b2cfabfcfd951daaa915a036ab53c9e1
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Fri Sep 16 09:19:14 2022 +0200

    bpf: Prevent bpf program recursion for raw tracepoint probes
    
    We got report from sysbot [1] about warnings that were caused by
    bpf program attached to contention_begin raw tracepoint triggering
    the same tracepoint by using bpf_trace_printk helper that takes
    trace_printk_lock lock.
    
     Call Trace:
      <TASK>
      ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90
      bpf_trace_printk+0x2b/0xe0
      bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
      bpf_trace_run2+0x26/0x90
      native_queued_spin_lock_slowpath+0x1c6/0x2b0
      _raw_spin_lock_irqsave+0x44/0x50
      bpf_trace_printk+0x3f/0xe0
      bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
      bpf_trace_run2+0x26/0x90
      native_queued_spin_lock_slowpath+0x1c6/0x2b0
      _raw_spin_lock_irqsave+0x44/0x50
      bpf_trace_printk+0x3f/0xe0
      bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
      bpf_trace_run2+0x26/0x90
      native_queued_spin_lock_slowpath+0x1c6/0x2b0
      _raw_spin_lock_irqsave+0x44/0x50
      bpf_trace_printk+0x3f/0xe0
      bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
      bpf_trace_run2+0x26/0x90
      native_queued_spin_lock_slowpath+0x1c6/0x2b0
      _raw_spin_lock_irqsave+0x44/0x50
      __unfreeze_partials+0x5b/0x160
      ...
    
    The can be reproduced by attaching bpf program as raw tracepoint on
    contention_begin tracepoint. The bpf prog calls bpf_trace_printk
    helper. Then by running perf bench the spin lock code is forced to
    take slow path and call contention_begin tracepoint.
    
    Fixing this by skipping execution of the bpf program if it's
    already running, Using bpf prog 'active' field, which is being
    currently used by trampoline programs for the same reason.
    
    Moving bpf_prog_inc_misses_counter to syscall.c because
    trampoline.c is compiled in just for CONFIG_BPF_JIT option.
    
    Reviewed-by: Stanislav Fomichev <sdf@google.com>
    Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com
    [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:39 +01:00
Artem Savkov 9da89b0f62 bpf: Return value in kprobe get_func_ip only for entry address
Bugzilla: https://bugzilla.redhat.com/2166911

commit 0e253f7e558a3e250902ba2034091e0185448836
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Sep 26 17:33:39 2022 +0200

    bpf: Return value in kprobe get_func_ip only for entry address
    
    Changing return value of kprobe's version of bpf_get_func_ip
    to return zero if the attach address is not on the function's
    entry point.
    
    For kprobes attached in the middle of the function we can't easily
    get to the function address especially now with the CONFIG_X86_KERNEL_IBT
    support.
    
    If user cares about current IP for kprobes attached within the
    function body, they can get it with PT_REGS_IP(ctx).
    
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Martynas Pumputis <m@lambda.lt>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:18 +01:00
Artem Savkov f66e5962b4 bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBT
Bugzilla: https://bugzilla.redhat.com/2166911

commit c09eb2e578eb1668bbc84dc07e8d8bd6f04b9a02
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Sep 26 17:33:38 2022 +0200

    bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBT
    
    Martynas reported bpf_get_func_ip returning +4 address when
    CONFIG_X86_KERNEL_IBT option is enabled.
    
    When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction
    at the function entry, which screws return value of bpf_get_func_ip()
    helper that should return the function address.
    
    There's short term workaround for kprobe_multi bpf program made by
    Alexei [1], but we need this fixup also for bpf_get_attach_cookie,
    that returns cookie based on the entry_ip value.
    
    Moving the fixup in the fprobe handler, so both bpf_get_func_ip
    and bpf_get_attach_cookie get expected function address when
    CONFIG_X86_KERNEL_IBT option is enabled.
    
    Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip
    so it's clearer this is an ftrace __fentry__ ip.
    
    [1] commit 7f0059b58f02 ("selftests/bpf: Fix kprobe_multi test.")
    
    Cc: Peter Zijlstra <peterz@infradead.org>
    Reported-by: Martynas Pumputis <m@lambda.lt>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:18 +01:00
Artem Savkov d9fce0cf92 bpf: Add bpf_verify_pkcs7_signature() kfunc
Bugzilla: https://bugzilla.redhat.com/2166911

commit 865b0566d8f1a0c3937e5eb4bd6ba4ef03e7e98c
Author: Roberto Sassu <roberto.sassu@huawei.com>
Date:   Tue Sep 20 09:59:46 2022 +0200

    bpf: Add bpf_verify_pkcs7_signature() kfunc
    
    Add the bpf_verify_pkcs7_signature() kfunc, to give eBPF security modules
    the ability to check the validity of a signature against supplied data, by
    using user-provided or system-provided keys as trust anchor.
    
    The new kfunc makes it possible to enforce mandatory policies, as eBPF
    programs might be allowed to make security decisions only based on data
    sources the system administrator approves.
    
    The caller should provide the data to be verified and the signature as eBPF
    dynamic pointers (to minimize the number of parameters) and a bpf_key
    structure containing a reference to the keyring with keys trusted for
    signature verification, obtained from bpf_lookup_user_key() or
    bpf_lookup_system_key().
    
    For bpf_key structures obtained from the former lookup function,
    bpf_verify_pkcs7_signature() completes the permission check deferred by
    that function by calling key_validate(). key_task_permission() is already
    called by the PKCS#7 code.
    
    Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
    Acked-by: KP Singh <kpsingh@kernel.org>
    Acked-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20220920075951.929132-9-roberto.sassu@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:16 +01:00
Artem Savkov 5426219557 bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
Bugzilla: https://bugzilla.redhat.com/2166911

commit f3cf4134c5c6c47b9b5c7aa3cb2d67e107887a7b
Author: Roberto Sassu <roberto.sassu@huawei.com>
Date:   Tue Sep 20 09:59:45 2022 +0200

    bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
    
    Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put()
    kfuncs, to respectively search a key with a given key handle serial number
    and flags, obtain a key from a pre-determined ID defined in
    include/linux/verification.h, and cleanup.
    
    Introduce system_keyring_id_check() to validate the keyring ID parameter of
    bpf_lookup_system_key().
    
    Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Acked-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20220920075951.929132-8-roberto.sassu@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-03-06 14:54:15 +01:00
Artem Savkov bee9a85dbb bpf: implement sleepable uprobes by chaining gps
Bugzilla: https://bugzilla.redhat.com/2137876

commit 8c7dcb84e3b744b2b70baa7a44a9b1881c33a9c9
Author: Delyan Kratunov <delyank@fb.com>
Date:   Tue Jun 14 23:10:46 2022 +0000

    bpf: implement sleepable uprobes by chaining gps
    
    uprobes work by raising a trap, setting a task flag from within the
    interrupt handler, and processing the actual work for the uprobe on the
    way back to userspace. As a result, uprobe handlers already execute in a
    might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe
    programs is therefore on the bpf side.
    
    Namely, the bpf_prog_array attached to the uprobe is protected by normal
    rcu. In order for uprobe bpf programs to become sleepable, it has to be
    protected by the tasks_trace rcu flavor instead (and kfree() called after
    a corresponding grace period).
    
    Therefore, the free path for bpf_prog_array now chains a tasks_trace and
    normal grace periods one after the other.
    
    Users who iterate under tasks_trace read section would
    be safe, as would users who iterate under normal read sections (from
    non-sleepable locations).
    
    The downside is that the tasks_trace latency affects all perf_event-attached
    bpf programs (and not just uprobe ones). This is deemed safe given the
    possible attach rates for kprobe/uprobe/tp programs.
    
    Separately, non-sleepable programs need access to dynamically sized
    rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes
    an rcu read section, in addition to the overarching tasks_trace section.
    
    Signed-off-by: Delyan Kratunov <delyank@fb.com>
    Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2023-01-05 15:46:30 +01:00
Yauheni Kaliuta 7dcd07e6dc bpf: Add bpf_skc_to_mptcp_sock_proto
Bugzilla: https://bugzilla.redhat.com/2120968

commit 3bc253c2e652cf5f12cd8c00d80d8ec55d67d1a7
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Thu May 19 16:30:10 2022 -0700

    bpf: Add bpf_skc_to_mptcp_sock_proto
    
    This patch implements a new struct bpf_func_proto, named
    bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP,
    and a new helper bpf_skc_to_mptcp_sock(), which invokes another new
    helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct
    mptcp_sock from a given subflow socket.
    
    v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c,
    remove build check for CONFIG_BPF_JIT
    v5: Drop EXPORT_SYMBOL (Martin)
    
    Co-developed-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:05 +02:00
Yauheni Kaliuta c0c280946f bpf: add bpf_map_lookup_percpu_elem for percpu map
Bugzilla: https://bugzilla.redhat.com/2120968

commit 07343110b293456d30393e89b86c4dee1ac051c8
Author: Feng Zhou <zhoufeng.zf@bytedance.com>
Date:   Wed May 11 17:38:53 2022 +0800

    bpf: add bpf_map_lookup_percpu_elem for percpu map
    
    Add new ebpf helpers bpf_map_lookup_percpu_elem.
    
    The implementation method is relatively simple, refer to the implementation
    method of map_lookup_elem of percpu map, increase the parameters of cpu, and
    obtain it according to the specified cpu.
    
    Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
    Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:04 +02:00
Yauheni Kaliuta 1c3a7dd065 bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Bugzilla: https://bugzilla.redhat.com/2120968

commit 2fcc82411e74e5e6aba336561cf56fb899bfae4e
Author: Kui-Feng Lee <kuifeng@fb.com>
Date:   Tue May 10 13:59:21 2022 -0700

    bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
    
    Pass a cookie along with BPF_LINK_CREATE requests.
    
    Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
    The cookie of a bpf_tracing_link is available by calling
    bpf_get_attach_cookie when running the BPF program of the attached
    link.
    
    The value of a cookie will be set at bpf_tramp_run_ctx by the
    trampoline of the link.
    
    Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-30 12:47:03 +02:00
Yauheni Kaliuta 69950554d0 bpf: Move rcu lock management out of BPF_PROG_RUN routines
Bugzilla: https://bugzilla.redhat.com/2120968

commit 055eb95533273bc334794dbc598400d10800528f
Author: Stanislav Fomichev <sdf@google.com>
Date:   Thu Apr 14 09:12:33 2022 -0700

    bpf: Move rcu lock management out of BPF_PROG_RUN routines
    
    Commit 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros
    into functions") switched a bunch of BPF_PROG_RUN macros to inline
    routines. This changed the semantic a bit. Due to arguments expansion
    of macros, it used to be:
    
    	rcu_read_lock();
    	array = rcu_dereference(cgrp->bpf.effective[atype]);
    	...
    
    Now, with with inline routines, we have:
    	array_rcu = rcu_dereference(cgrp->bpf.effective[atype]);
    	/* array_rcu can be kfree'd here */
    	rcu_read_lock();
    	array = rcu_dereference(array_rcu);
    
    I'm assuming in practice rcu subsystem isn't fast enough to trigger
    this but let's use rcu API properly.
    
    Also, rename to lower caps to not confuse with macros. Additionally,
    drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY.
    
    See [1] for more context.
    
      [1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u
    
    v2
    - keep rcu locks inside by passing cgroup_bpf
    
    Fixes: 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions")
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-28 16:52:09 +02:00
Yauheni Kaliuta 791fb1dd0d bpf: Use swap() instead of open coding it
Bugzilla: https://bugzilla.redhat.com/2120968

commit 11e17ae423778f48c84da6a2e215f140610e1973
Author: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Date:   Tue Mar 22 14:21:49 2022 +0800

    bpf: Use swap() instead of open coding it
    
    Clean the following coccicheck warning:
    
    ./kernel/trace/bpf_trace.c:2263:34-35: WARNING opportunity for swap().
    ./kernel/trace/bpf_trace.c:2264:40-41: WARNING opportunity for swap().
    
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220322062149.109180-1-jiapeng.chong@linux.alibaba.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-28 16:48:55 +02:00
Jerome Marchand 9fd9812f42 bpf: Force cookies array to follow symbols sorting
Bugzilla: https://bugzilla.redhat.com/2120966

commit eb5fb0325698d05f0bf78d322de82c451a3685a2
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Jun 15 13:21:17 2022 +0200

    bpf: Force cookies array to follow symbols sorting

    When user specifies symbols and cookies for kprobe_multi link
    interface it's very likely the cookies will be misplaced and
    returned to wrong functions (via get_attach_cookie helper).

    The reason is that to resolve the provided functions we sort
    them before passing them to ftrace_lookup_symbols, but we do
    not do the same sort on the cookie values.

    Fixing this by using sort_r function with custom swap callback
    that swaps cookie values as well.

    Fixes: 0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:11 +02:00
Jerome Marchand 63314fb347 bpf: Use safer kvmalloc_array() where possible
Bugzilla: https://bugzilla.redhat.com/2120966

commit fd58f7df2415ef747782e01f94880fefad1247cf
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Thu May 26 13:24:05 2022 +0300

    bpf: Use safer kvmalloc_array() where possible

    The kvmalloc_array() function is safer because it has a check for
    integer overflows.  These sizes come from the user and I was not
    able to see any bounds checking so an integer overflow seems like a
    realistic concern.

    Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:10 +02:00
Jerome Marchand 29f22da84b bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link
Bugzilla: https://bugzilla.redhat.com/2120966

commit 0236fec57a15dc2a068dfe4488e0c2ab4559b1ec
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue May 10 14:26:15 2022 +0200

    bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link

    Using kallsyms_lookup_names function to speed up symbols lookup in
    kprobe multi link attachment and replacing with it the current
    kprobe_multi_resolve_syms function.

    This speeds up bpftrace kprobe attachment:

      # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
      ...
      6.5681 +- 0.0225 seconds time elapsed  ( +-  0.34% )

    After:

      # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
      ...
      0.5661 +- 0.0275 seconds time elapsed  ( +-  4.85% )

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20220510122616.2652285-5-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:10 +02:00
Jerome Marchand 5158c401e5 bpf: Fix sparse warnings in kprobe_multi_resolve_syms
Bugzilla: https://bugzilla.redhat.com/2120966

commit d31e0386a2f122b40b605eb0120a2fbcfca77868
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Mar 30 13:05:10 2022 +0200

    bpf: Fix sparse warnings in kprobe_multi_resolve_syms

    Adding missing __user tags to fix sparse warnings:

    kernel/trace/bpf_trace.c:2370:34: warning: incorrect type in argument 2 (different address spaces)
    kernel/trace/bpf_trace.c:2370:34:    expected void const [noderef] __user *from
    kernel/trace/bpf_trace.c:2370:34:    got void const *usyms
    kernel/trace/bpf_trace.c:2376:51: warning: incorrect type in argument 2 (different address spaces)
    kernel/trace/bpf_trace.c:2376:51:    expected char const [noderef] __user *src
    kernel/trace/bpf_trace.c:2376:51:    got char const *
    kernel/trace/bpf_trace.c:2443:49: warning: incorrect type in argument 1 (different address spaces)
    kernel/trace/bpf_trace.c:2443:49:    expected void const *usyms
    kernel/trace/bpf_trace.c:2443:49:    got void [noderef] __user *[assigned] usyms

    Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
    Reported-by: Alexei Starovoitov <ast@kernel.org>
    Reported-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220330110510.398558-1-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:08 +02:00
Jerome Marchand 04025120d1 bpf: Fix kprobe_multi return probe backtrace
Bugzilla: https://bugzilla.redhat.com/2120966

commit f70986902c86f88612ed45a96aa7cf4caa65f7c1
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Mar 21 08:01:13 2022 +0100

    bpf: Fix kprobe_multi return probe backtrace

    Andrii reported that backtraces from kprobe_multi program attached
    as return probes are not complete and showing just initial entry [1].

    It's caused by changing registers to have original function ip address
    as instruction pointer even for return probe, which will screw backtrace
    from return probe.

    This change keeps registers intact and store original entry ip and
    link address on the stack in bpf_kprobe_multi_run_ctx struct, where
    bpf_get_func_ip and bpf_get_attach_cookie helpers for kprobe_multi
    programs can find it.

    [1] https://lore.kernel.org/bpf/CAEf4BzZDDqK24rSKwXNp7XL3ErGD4bZa1M6c_c4EvDSt3jrZcg@mail.gmail.com/T/#m8d1301c0ea0892ddf9dc6fba57a57b8cf11b8c51

    Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link")
    Reported-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220321070113.1449167-3-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:07 +02:00
Jerome Marchand 51ba8f10f6 Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"
Bugzilla: https://bugzilla.redhat.com/2120966

commit f705ec764b34323412f14b9bd95412e9bcb8770b
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Mar 21 08:01:12 2022 +0100

    Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"

    This reverts commit 97ee4d20ee67eb462581a7af01442de6586e390b.

    Following change is adding more complexity to bpf_get_func_ip
    helper for kprobe_multi programs, which can't be inlined easily.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220321070113.1449167-2-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:06 +02:00
Jerome Marchand 683541ee89 bpf: Add cookie support to programs attached with kprobe multi link
Bugzilla: https://bugzilla.redhat.com/2120966

commit ca74823c6e16dd42b7cf60d9fdde80e2a81a67bb
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Mar 16 13:24:12 2022 +0100

    bpf: Add cookie support to programs attached with kprobe multi link

    Adding support to call bpf_get_attach_cookie helper from
    kprobe programs attached with kprobe multi link.

    The cookie is provided by array of u64 values, where each
    value is paired with provided function address or symbol
    with the same array index.

    When cookie array is provided it's sorted together with
    addresses (check bpf_kprobe_multi_cookie_swap). This way
    we can find cookie based on the address in
    bpf_get_attach_cookie helper.

    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:04 +02:00
Jerome Marchand 795fbd7911 bpf: Add support to inline bpf_get_func_ip helper on x86
Bugzilla: https://bugzilla.redhat.com/2120966

commit 97ee4d20ee67eb462581a7af01442de6586e390b
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Mar 16 13:24:11 2022 +0100

    bpf: Add support to inline bpf_get_func_ip helper on x86

    Adding support to inline it on x86, because it's single
    load instruction.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20220316122419.933957-6-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:04 +02:00
Jerome Marchand 2a8db175cd bpf: Add bpf_get_func_ip kprobe helper for multi kprobe link
Bugzilla: https://bugzilla.redhat.com/2120966

commit 42a5712094e89ef0a125ac0f9d0873f9233368b1
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Mar 16 13:24:10 2022 +0100

    bpf: Add bpf_get_func_ip kprobe helper for multi kprobe link

    Adding support to call bpf_get_func_ip helper from kprobe
    programs attached by multi kprobe link.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220316122419.933957-5-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:04 +02:00
Jerome Marchand 0c66788f0e bpf: Add multi kprobe link
Bugzilla: https://bugzilla.redhat.com/2120966

commit 0dcac272540613d41c05e89679e4ddb978b612f1
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Wed Mar 16 13:24:09 2022 +0100

    bpf: Add multi kprobe link

    Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe
    program through fprobe API.

    The fprobe API allows to attach probe on multiple functions at once
    very fast, because it works on top of ftrace. On the other hand this
    limits the probe point to the function entry or return.

    The kprobe program gets the same pt_regs input ctx as when it's attached
    through the perf API.

    Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment
    kprobe to multiple function with new link.

    User provides array of addresses or symbols with count to attach the
    kprobe program to. The new link_create uapi interface looks like:

      struct {
              __u32           flags;
              __u32           cnt;
              __aligned_u64   syms;
              __aligned_u64   addrs;
      } kprobe_multi;

    The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create
    return multi kprobe.

    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:58:04 +02:00
Jerome Marchand e13edf1f6a bpf: Add bpf_copy_from_user_task() helper
Bugzilla: https://bugzilla.redhat.com/2120966

commit 376040e47334c6dc6a939a32197acceb00fe4acf
Author: Kenny Yu <kennyyu@fb.com>
Date:   Mon Jan 24 10:54:01 2022 -0800

    bpf: Add bpf_copy_from_user_task() helper

    This adds a helper for bpf programs to read the memory of other
    tasks.

    As an example use case at Meta, we are using a bpf task iterator program
    and this new helper to print C++ async stack traces for all threads of
    a given process.

    Signed-off-by: Kenny Yu <kennyyu@fb.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20220124185403.468466-3-kennyyu@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-10-25 14:57:43 +02:00
Jiri Benc 6ba74a863c bpf: add frags support to xdp copy helpers
Bugzilla: https://bugzilla.redhat.com/2120966

commit d99173027d6803430fd60e61aab3006644e18628
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Fri Jan 21 11:09:56 2022 +0100

    bpf: add frags support to xdp copy helpers

    This patch adds support for frags for the following helpers:
      - bpf_xdp_output()
      - bpf_perf_event_output()

    Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/340b4a99cdc24337b40eaf8bb597f9f9e7b0373e.1642758637.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:42 +02:00
Chris von Recklinghausen ea2fa2fb80 uaccess: remove CONFIG_SET_FS
Conflicts: in arch/, only keep changes to arch/Kconfig and
	arch/arm64/kernel/traps.c. All other arch files in the upstream version
	of this patch are dropped.

Bugzilla: https://bugzilla.redhat.com/2120352

commit 967747bbc084b93b54e66f9047d342232314cd25
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Fri Feb 11 21:42:45 2022 +0100

    uaccess: remove CONFIG_SET_FS

    There are no remaining callers of set_fs(), so CONFIG_SET_FS
    can be removed globally, along with the thread_info field and
    any references to it.

    This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

    As CONFIG_SET_FS is now gone, drop all remaining references to
    set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel().

    Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes
    Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
    Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc ch
anges
    Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic]
    Acked-by: Dinh Nguyen <dinguyen@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:45 -04:00
Artem Savkov 7f76bfc54f bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20
Author: Hao Luo <haoluo@google.com>
Date:   Thu Dec 16 16:31:51 2021 -0800

    bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.

    Some helper functions may modify its arguments, for example,
    bpf_d_path, bpf_get_stack etc. Previously, their argument types
    were marked as ARG_PTR_TO_MEM, which is compatible with read-only
    mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate,
    but technically incorrect, to modify a read-only memory by passing
    it into one of such helper functions.

    This patch tags the bpf_args compatible with immutable memory with
    MEM_RDONLY flag. The arguments that don't have this flag will be
    only compatible with mutable memory types, preventing the helper
    from modifying a read-only memory. The bpf_args that have
    MEM_RDONLY are compatible with both mutable memory and immutable
    memory.

    Signed-off-by: Hao Luo <haoluo@google.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:50 +02:00
Artem Savkov 21c52694d3 bpf: Add get_func_[arg|ret|arg_cnt] helpers
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit f92c1e183604c20ce00eb889315fdaa8f2d9e509
Author: Jiri Olsa <jolsa@redhat.com>
Date:   Wed Dec 8 20:32:44 2021 +0100

    bpf: Add get_func_[arg|ret|arg_cnt] helpers

    Adding following helpers for tracing programs:

    Get n-th argument of the traced function:
      long bpf_get_func_arg(void *ctx, u32 n, u64 *value)

    Get return value of the traced function:
      long bpf_get_func_ret(void *ctx, u64 *value)

    Get arguments count of the traced function:
      long bpf_get_func_arg_cnt(void *ctx)

    The trampoline now stores number of arguments on ctx-8
    address, so it's easy to verify argument index and find
    return value argument's position.

    Moving function ip address on the trampoline stack behind
    the number of functions arguments, so it's now stored on
    ctx-16 address if it's needed.

    All helpers above are inlined by verifier.

    Also bit unrelated small change - using newly added function
    bpf_prog_has_trampoline in check_get_func_ip.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:47 +02:00
Artem Savkov 5cebd099b9 bpf: Introduce btf_tracing_ids
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit d19ddb476a539fd78ad1028ae13bb38506286931
Author: Song Liu <songliubraving@fb.com>
Date:   Fri Nov 12 07:02:43 2021 -0800

    bpf: Introduce btf_tracing_ids

    Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct,
    file, and vm_area_struct via easy to understand format like
    btf_tracing_ids[BTF_TRACING_TYPE_[TASK|file|VMA]].

    Suggested-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211112150243.1270987-3-songliubraving@fb.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:37 +02:00
Artem Savkov c083c778ce bpf: Introduce helper bpf_find_vma
Bugzilla: https://bugzilla.redhat.com/2069046

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 7c7e3d31e7856a8260a254f8c71db416f7f9f5a1
Author: Song Liu <songliubraving@fb.com>
Date:   Fri Nov 5 16:23:29 2021 -0700

    bpf: Introduce helper bpf_find_vma

    In some profiler use cases, it is necessary to map an address to the
    backing file, e.g., a shared library. bpf_find_vma helper provides a
    flexible way to achieve this. bpf_find_vma maps an address of a task to
    the vma (vm_area_struct) for this address, and feed the vma to an callback
    BPF function. The callback function is necessary here, as we need to
    ensure mmap_sem is unlocked.

    It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
    safely when irqs are disable, we use the same mechanism as stackmap with
    build_id. Specifically, when irqs are disabled, the unlocked is postponed
    in an irq_work. Refactor stackmap.c so that the irq_work is shared among
    bpf_find_vma and stackmap helpers.

    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211105232330.1936330-2-songliubraving@fb.com

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2022-08-24 12:53:34 +02:00
Yauheni Kaliuta 95d43af94e bpf: Factor out helpers for ctx access checking
Bugzilla: http://bugzilla.redhat.com/2069045

commit 35346ab64132d0f5919b06932d708c0d10360553
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Oct 25 14:40:23 2021 +0800

    bpf: Factor out helpers for ctx access checking
    
    Factor out two helpers to check the read access of ctx for raw tp
    and BTF function. bpf_tracing_ctx_access() is used to check
    the read access to argument is valid, and bpf_tracing_btf_ctx_access()
    checks whether the btf type of argument is valid besides the checking
    of argument read. bpf_tracing_btf_ctx_access() will be used by the
    following patch.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:23:49 +03:00
Yauheni Kaliuta b7a8adb0f8 bpf: Add bpf_skc_to_unix_sock() helper
Bugzilla: http://bugzilla.redhat.com/2069045

Conflicts: context difference due to already applied
  5e0bc3082e2e ("bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs")

commit 9eeb3aa33ae005526f672b394c1791578463513f
Author: Hengqi Chen <hengqi.chen@gmail.com>
Date:   Thu Oct 21 21:47:51 2021 +0800

    bpf: Add bpf_skc_to_unix_sock() helper

    The helper is used in tracing programs to cast a socket
    pointer to a unix_sock pointer.
    The return value could be NULL if the casting is illegal.

    Suggested-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Song Liu <songliubraving@fb.com>
    Link: https://lore.kernel.org/bpf/20211021134752.1223426-2-hengqi.chen@gmail.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:23:43 +03:00
Yauheni Kaliuta 3c3c123ddb bpf: Add bpf_trace_vprintk helper
Bugzilla: http://bugzilla.redhat.com/2069045

commit 10aceb629e198429c849d5e995c3bb1ba7a9aaa3
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Fri Sep 17 11:29:05 2021 -0700

    bpf: Add bpf_trace_vprintk helper
    
    This helper is meant to be "bpf_trace_printk, but with proper vararg
    support". Follow bpf_snprintf's example and take a u64 pseudo-vararg
    array. Write to /sys/kernel/debug/tracing/trace_pipe using the same
    mechanism as bpf_trace_printk. The functionality of this helper was
    requested in the libbpf issue tracker [0].
    
    [0] Closes: https://github.com/libbpf/libbpf/issues/315
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210917182911.2426606-4-davemarchevsky@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:16:14 +03:00
Yauheni Kaliuta b2847aa466 bpf: Merge printk and seq_printf VARARG max macros
Bugzilla: http://bugzilla.redhat.com/2069045

commit 335ff4990cf3bfa42d8846f9b3d8c09456f51801
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Fri Sep 17 11:29:03 2021 -0700

    bpf: Merge printk and seq_printf VARARG max macros
    
    MAX_SNPRINTF_VARARGS and MAX_SEQ_PRINTF_VARARGS are used by bpf helpers
    bpf_snprintf and bpf_seq_printf to limit their varargs. Both call into
    bpf_bprintf_prepare for print formatting logic and have convenience
    macros in libbpf (BPF_SNPRINTF, BPF_SEQ_PRINTF) which use the same
    helper macros to convert varargs to a byte array.
    
    Changing shared functionality to support more varargs for either bpf
    helper would affect the other as well, so let's combine the _VARARGS
    macros to make this more obvious.
    
    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210917182911.2426606-2-davemarchevsky@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:16:14 +03:00
Yauheni Kaliuta 85422524be bpf: Introduce helper bpf_get_branch_snapshot
Bugzilla: http://bugzilla.redhat.com/2069045

commit 856c02dbce4f8d6a5644083db22c11750aa11481
Author: Song Liu <songliubraving@fb.com>
Date:   Fri Sep 10 11:33:51 2021 -0700

    bpf: Introduce helper bpf_get_branch_snapshot
    
    Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
    branch trace from hardware (e.g. Intel LBR). To use the feature, the
    user need to create perf_event with proper branch_record filtering
    on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
    On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-06-03 17:16:11 +03:00
Jiri Benc c876934bc6 bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071618

Conflicts:
- [minor] context difference in include/linux/bpf.h due to out of order
  backport of 7adfc6c9b315 "bpf: Add bpf_get_attach_cookie() BPF helper to
  access bpf_cookie value"

commit 3cee6fb8e69ecd79be891c89a94974c48a25a437
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Thu Jul 1 13:06:19 2021 -0700

    bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter

    This patch allows bpf tcp iter to call bpf_(get|set)sockopt.
    To allow a specific bpf iter (tcp here) to call a set of helpers,
    get_func_proto function pointer is added to bpf_iter_reg.
    The bpf iter is a tracing prog which currently requires
    CAP_PERFMON or CAP_SYS_ADMIN, so this patch does not
    impose other capability checks for bpf_(get|set)sockopt.

    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210701200619.1036715-1-kafai@fb.com

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-05-12 17:29:46 +02:00
Jerome Marchand c6feb8361c bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
Bugzilla: https://bugzilla.redhat.com/2041365

Conflicts: Minor context change from missing commit eb18b49ea758 ("bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt")

commit 5e0bc3082e2e403ac0753e099c2b01446bb35578
Author: Dmitrii Banshchikov <me@ubique.spb.ru>
Date:   Sat Nov 13 18:22:26 2021 +0400

    bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs

    Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing
    progs may result in locking issues.

    bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that
    isn't safe for any context:
    ======================================================
    WARNING: possible circular locking dependency detected
    5.15.0-syzkaller #0 Not tainted
    ------------------------------------------------------
    syz-executor.4/14877 is trying to acquire lock:
    ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255

    but task is already holding lock:
    ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&obj_hash[i].lock){-.-.}-{2:2}:
           lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
           __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
           _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162
           __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569
           debug_hrtimer_init kernel/time/hrtimer.c:414 [inline]
           debug_init kernel/time/hrtimer.c:468 [inline]
           hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592
           ntp_init_cmos_sync kernel/time/ntp.c:676 [inline]
           ntp_init+0xa1/0xad kernel/time/ntp.c:1095
           timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639
           start_kernel+0x267/0x56e init/main.c:1030
           secondary_startup_64_no_verify+0xb1/0xbb

    -> #0 (tk_core.seq.seqcount){----}-{0:0}:
           check_prev_add kernel/locking/lockdep.c:3051 [inline]
           check_prevs_add kernel/locking/lockdep.c:3174 [inline]
           validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789
           __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015
           lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
           seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103
           ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255
           ktime_get_coarse include/linux/timekeeping.h:120 [inline]
           ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline]
           ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline]
           bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171
           bpf_prog_a99735ebafdda2f1+0x10/0xb50
           bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline]
           __bpf_prog_run include/linux/filter.h:626 [inline]
           bpf_prog_run include/linux/filter.h:633 [inline]
           BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline]
           trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127
           perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708
           perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39
           trace_lock_release+0x128/0x150 include/trace/events/lock.h:58
           lock_release+0x82/0x810 kernel/locking/lockdep.c:5636
           __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline]
           _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194
           debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline]
           debug_deactivate kernel/time/hrtimer.c:481 [inline]
           __run_hrtimer kernel/time/hrtimer.c:1653 [inline]
           __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749
           hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811
           local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
           __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103
           sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097
           asm_sysvec_apic_timer_interrupt+0x12/0x20
           __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
           _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194
           try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118
           wake_up_process kernel/sched/core.c:4200 [inline]
           wake_up_q+0x9a/0xf0 kernel/sched/core.c:953
           futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184
           do_futex+0x367/0x560 kernel/futex/syscalls.c:127
           __do_sys_futex kernel/futex/syscalls.c:199 [inline]
           __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180
           do_syscall_x64 arch/x86/entry/common.c:50 [inline]
           do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
           entry_SYSCALL_64_after_hwframe+0x44/0xae

    There is a possible deadlock with bpf_timer_* set of helpers:
    hrtimer_start()
      lock_base();
      trace_hrtimer...()
        perf_event()
          bpf_run()
            bpf_timer_start()
              hrtimer_start()
                lock_base()         <- DEADLOCK

    Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in
    BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT
    and BPF_PROG_TYPE_RAW_TRACEPOINT prog types.

    Fixes: d055126180 ("bpf: Add bpf_ktime_get_coarse_ns helper")
    Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
    Reported-by: syzbot+43fd005b5a1b4d10781e@syzkaller.appspotmail.com
    Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20211113142227.566439-2-me@ubique.spb.ru

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:17:16 +02:00
Jerome Marchand 70ca1bb90e bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
Bugzilla: http://bugzilla.redhat.com/2041365

commit eb529c5b10b9401a0f2d1f469e82c6a0ba98082c
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Wed Aug 25 18:48:31 2021 -0700

    bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS

    This commit fixes linker errors along the lines of:

        s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'`

    Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c
    since there exists code that unconditionally uses btf_task_struct_ids.

    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:17:12 +02:00
Jerome Marchand 2682e1df7c bpf: Add bpf_task_pt_regs() helper
Bugzilla: http://bugzilla.redhat.com/2041365

commit dd6e10fbd9fb86a571d925602c8a24bb4d09a2a7
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Mon Aug 23 19:43:49 2021 -0700

    bpf: Add bpf_task_pt_regs() helper

    The motivation behind this helper is to access userspace pt_regs in a
    kprobe handler.

    uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
    pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
    kprobe handler. The final case (kernelspace pt_regs in uprobe) is
    pretty rare (usermode helper) so I think that can be solved later if
    necessary.

    More concretely, this helper is useful in doing BPF-based DWARF stack
    unwinding. Currently the kernel can only do framepointer based stack
    unwinds for userspace code. This is because the DWARF state machines are
    too fragile to be computed in kernelspace [0]. The idea behind
    DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
    stack (while in prog context) and send it up to userspace for unwinding
    (probably with libunwind) [1]. This would effectively enable profiling
    applications with -fomit-frame-pointer using kprobes and uprobes.

    [0]: https://lkml.org/lkml/2012/2/10/356
    [1]: https://github.com/danobi/bpf-dwarf-walk

    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:47 +02:00
Jerome Marchand 020f743d7a bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()
Bugzilla: http://bugzilla.redhat.com/2041365

commit a396eda5517ac958fb4eb7358f4708eb829058c4
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Mon Aug 23 19:43:48 2021 -0700

    bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()

    bpf_get_current_task() is already supported so it's natural to also
    include the _btf() variant for btf-powered helpers.

    This is required for non-tracing progs to use bpf_task_pt_regs() in the
    next commit.

    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:47 +02:00
Jerome Marchand 5ba1f36d0a bpf: Consolidate task_struct BTF_ID declarations
Bugzilla: http://bugzilla.redhat.com/2041365

commit 33c5cb36015ac1034b50b823fae367e908d05147
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Mon Aug 23 19:43:47 2021 -0700

    bpf: Consolidate task_struct BTF_ID declarations

    No need to have it defined 5 times. Once is enough.

    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:47 +02:00
Jerome Marchand 1fb366ceb5 bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value
Bugzilla: http://bugzilla.redhat.com/2041365

commit 7adfc6c9b315e174cf8743b21b7b691c8766791b
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Sun Aug 15 00:05:59 2021 -0700

    bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value

    Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
    to get access to a user-provided bpf_cookie value, specified during BPF
    program attachment (BPF link creation) time.

    Naming is hard, though. With the concept being named "BPF cookie", I've
    considered calling the helper:
      - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
        cookie;
      - bpf_get_bpf_cookie() -- too much tautology;
      - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
        attach BPF program to BPF hook, it's still an "attachment" and the
        bpf_cookie is associated with BPF program attachment to a hook, not a BPF
        link itself. Technically, we could support bpf_cookie with old-style
        cgroup programs.So I ultimately rejected it in favor of
        bpf_get_attach_cookie().

    Currently all perf_event-backed BPF program types support
    bpf_get_attach_cookie() helper. Follow-up patches will add support for
    fentry/fexit programs as well.

    While at it, mark bpf_tracing_func_proto() as static to make it obvious that
    it's only used from within the kernel/trace/bpf_trace.c.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:41 +02:00
Jerome Marchand b0371ec3e5 bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Bugzilla: http://bugzilla.redhat.com/2041365

commit 82e6b1eee6a8875ef4eacfd60711cce6965c6b04
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Sun Aug 15 00:05:58 2021 -0700

    bpf: Allow to specify user-provided bpf_cookie for BPF perf links

    Add ability for users to specify custom u64 value (bpf_cookie) when creating
    BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
    tracepoints).

    This is useful for cases when the same BPF program is used for attaching and
    processing invocation of different tracepoints/kprobes/uprobes in a generic
    fashion, but such that each invocation is distinguished from each other (e.g.,
    BPF program can look up additional information associated with a specific
    kernel function without having to rely on function IP lookups). This enables
    new use cases to be implemented simply and efficiently that previously were
    possible only through code generation (and thus multiple instances of almost
    identical BPF program) or compilation at runtime (BCC-style) on target hosts
    (even more expensive resource-wise). For uprobes it is not even possible in
    some cases to know function IP before hand (e.g., when attaching to shared
    library without PID filtering, in which case base load address is not known
    for a library).

    This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
    corresponding to each attached and run BPF program. Given cgroup BPF programs
    already use two 8-byte pointers for their needs and cgroup BPF programs don't
    have (yet?) support for bpf_cookie, reuse that space through union of
    cgroup_storage and new bpf_cookie field.

    Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
    This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
    program execution code, which luckily is now also split from
    BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
    giving access to this user-provided cookie value from inside a BPF program.
    Generic perf_event BPF programs will access this value from perf_event itself
    through passed in BPF program context.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:41 +02:00
Jerome Marchand 88da8360b2 bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions
Bugzilla: http://bugzilla.redhat.com/2041365

commit 7d08c2c9117113fee118487425ed55efa50cbfa9
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Sun Aug 15 00:05:55 2021 -0700

    bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions

    Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
    with all the same readability and maintainability benefits. Making them into
    functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
    functions. Also, explicitly specifying the type of the BPF prog run callback
    required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
    internally to const struct sk_buff.

    Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
    BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
    the differences in bpf_run_ctx used for those two different use cases.

    I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
    struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
    cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
    required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
    everyone.

    The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
    pass-through user-provided context value in the next patch.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-3-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:40 +02:00
Jerome Marchand 5b5312cf91 bpf: Refactor BPF_PROG_RUN into a function
Bugzilla: http://bugzilla.redhat.com/2041365

Conflicts: Missing commit 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")

commit fb7dd8bca0139fd73d3f4a6cd257b11731317ded
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Sun Aug 15 00:05:54 2021 -0700

    bpf: Refactor BPF_PROG_RUN into a function

    Turn BPF_PROG_RUN into a proper always inlined function. No functional and
    performance changes are intended, but it makes it much easier to understand
    what's going on with how BPF programs are actually get executed. It's more
    obvious what types and callbacks are expected. Also extra () around input
    parameters can be dropped, as well as `__` variable prefixes intended to avoid
    naming collisions, which makes the code simpler to read and write.

    This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
    a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
    BPF_PROG_RUN into a function causes naming conflict compilation error. So
    rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
    bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
    BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:40 +02:00
Jerome Marchand 2863c38e3d bpf: Fix pointer cast warning
Bugzilla: http://bugzilla.redhat.com/2041365

commit 16c5900ba776c5acd6568abd60c40f948a96e496
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Jul 21 23:19:45 2021 +0200

    bpf: Fix pointer cast warning

    kp->addr is a pointer, so it cannot be cast directly to a 'u64'
    when it gets interpreted as an integer value:

    kernel/trace/bpf_trace.c: In function '____bpf_get_func_ip_kprobe':
    kernel/trace/bpf_trace.c:968:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
      968 |         return kp ? (u64) kp->addr : 0;

    Use the uintptr_t type instead.

    Fixes: 9ffd9f3ff719 ("bpf: Add bpf_get_func_ip helper for kprobe programs")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210721212007.3876595-1-arnd@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:34 +02:00
Jerome Marchand bb8301551a bpf: Add bpf_get_func_ip helper for kprobe programs
Bugzilla: http://bugzilla.redhat.com/2041365

commit 9ffd9f3ff7193933dae171740ab70a103d460065
Author: Jiri Olsa <jolsa@redhat.com>
Date:   Wed Jul 14 11:43:56 2021 +0200

    bpf: Add bpf_get_func_ip helper for kprobe programs

    Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs,
    so it's now possible to call bpf_get_func_ip from both kprobe and
    kretprobe programs.

    Taking the caller's address from 'struct kprobe::addr', which is
    defined for both kprobe and kretprobe.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
    Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:32 +02:00
Jerome Marchand 83be39299a bpf: Add bpf_get_func_ip helper for tracing programs
Bugzilla: http://bugzilla.redhat.com/2041365

commit 9b99edcae5c80c8fb9f8e7149bae528c9e610a72
Author: Jiri Olsa <jolsa@redhat.com>
Date:   Wed Jul 14 11:43:55 2021 +0200

    bpf: Add bpf_get_func_ip helper for tracing programs

    Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs,
    specifically for all trampoline attach types.

    The trampoline's caller IP address is stored in (ctx - 8) address.
    so there's no reason to actually call the helper, but rather fixup
    the call instruction and return [ctx - 8] value directly.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:32 +02:00
Jerome Marchand 00ace47307 bpf: Introduce bpf timers.
Bugzilla: http://bugzilla.redhat.com/2041365

commit b00628b1c7d595ae5b544e059c27b1f5828314b4
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Wed Jul 14 17:54:09 2021 -0700

    bpf: Introduce bpf timers.

    Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
    in hash/array/lru maps as a regular field and helpers to operate on it:

    // Initialize the timer.
    // First 4 bits of 'flags' specify clockid.
    // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
    long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags);

    // Configure the timer to call 'callback_fn' static function.
    long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);

    // Arm the timer to expire 'nsec' nanoseconds from the current time.
    long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags);

    // Cancel the timer and wait for callback_fn to finish if it was running.
    long bpf_timer_cancel(struct bpf_timer *timer);

    Here is how BPF program might look like:
    struct map_elem {
        int counter;
        struct bpf_timer timer;
    };

    struct {
        __uint(type, BPF_MAP_TYPE_HASH);
        __uint(max_entries, 1000);
        __type(key, int);
        __type(value, struct map_elem);
    } hmap SEC(".maps");

    static int timer_cb(void *map, int *key, struct map_elem *val);
    /* val points to particular map element that contains bpf_timer. */

    SEC("fentry/bpf_fentry_test1")
    int BPF_PROG(test1, int a)
    {
        struct map_elem *val;
        int key = 0;

        val = bpf_map_lookup_elem(&hmap, &key);
        if (val) {
            bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME);
            bpf_timer_set_callback(&val->timer, timer_cb);
            bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0);
        }
    }

    This patch adds helper implementations that rely on hrtimers
    to call bpf functions as timers expire.
    The following patches add necessary safety checks.

    Only programs with CAP_BPF are allowed to use bpf_timer.

    The amount of timers used by the program is constrained by
    the memcg recorded at map creation time.

    The bpf_timer_init() helper needs explicit 'map' argument because inner maps
    are dynamic and not known at load time. While the bpf_timer_set_callback() is
    receiving hidden 'aux->prog' argument supplied by the verifier.

    The prog pointer is needed to do refcnting of bpf program to make sure that
    program doesn't get freed while the timer is armed. This approach relies on
    "user refcnt" scheme used in prog_array that stores bpf programs for
    bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is
    paired with bpf_timer_cancel() that will drop the prog refcnt. The
    ops->map_release_uref is responsible for cancelling the timers and dropping
    prog refcnt when user space reference to a map reaches zero.
    This uref approach is done to make sure that Ctrl-C of user space process will
    not leave timers running forever unless the user space explicitly pinned a map
    that contained timers in bpffs.

    bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't
    have user references (is not held by open file descriptor from user space and
    not pinned in bpffs).

    The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
    and free the timer if given map element had it allocated.
    "bpftool map update" command can be used to cancel timers.

    The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
    '__u64 :64' has 1 byte alignment of 8 byte padding.

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2022-04-29 18:14:31 +02:00
Desnes A. Nunes do Rosario cc4187869d bpf: Remove config check to enable bpf support for branch records
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2048779
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=db52f57211b4e45f0ebb274e2c877b211dc18591

commit db52f57211b4e45f0ebb274e2c877b211dc18591
Author: Kajol Jain <kjain@linux.ibm.com>
Date: Mon, 6 Dec 2021 13:03:15 +0530

  Branch data available to BPF programs can be very useful to get stack traces
  out of userspace application.

  Commit fff7b64355 ("bpf: Add bpf_read_branch_records() helper") added BPF
  support to capture branch records in x86. Enable this feature also for other
  architectures as well by removing checks specific to x86.

  If an architecture doesn't support branch records, bpf_read_branch_records()
  still has appropriate checks and it will return an -EINVAL in that scenario.
  Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures
  should return -ENOENT in such case. Hence, update the appropriate check to
  return -ENOENT instead.

  Selftest 'perf_branches' result on power9 machine which has the branch stacks
  support:

   - Before this patch:

    [command]# ./test_progs -t perf_branches
     #88/1 perf_branches/perf_branches_hw:FAIL
     #88/2 perf_branches/perf_branches_no_hw:OK
     #88 perf_branches:FAIL
    Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED

   - After this patch:

    [command]# ./test_progs -t perf_branches
     #88/1 perf_branches/perf_branches_hw:OK
     #88/2 perf_branches/perf_branches_no_hw:OK
     #88 perf_branches:OK
    Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

  Selftest 'perf_branches' result on power9 machine which doesn't have branch
  stack report:

   - After this patch:

    [command]# ./test_progs -t perf_branches
     #88/1 perf_branches/perf_branches_hw:SKIP
     #88/2 perf_branches/perf_branches_no_hw:OK
     #88 perf_branches:OK
    Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED

  Fixes: fff7b64355 ("bpf: Add bpf_read_branch_records() helper")
  Suggested-by: Peter Zijlstra <peterz@infradead.org>
  Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
  Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com

Signed-off-by: Desnes A. Nunes do Rosario <drosario@redhat.com>
2022-04-06 15:28:50 -04:00
Daniel Borkmann 51e1bb9eea bpf: Add lockdown check for probe_write_user helper
Back then, commit 96ae522795 ("bpf: Add bpf_probe_write_user BPF helper
to be called in tracers") added the bpf_probe_write_user() helper in order
to allow to override user space memory. Its original goal was to have a
facility to "debug, divert, and manipulate execution of semi-cooperative
processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed
since it would otherwise tamper with its integrity.

One use case was shown in cf9b1199de ("samples/bpf: Add test/example of
using bpf_probe_write_user bpf helper") where the program DNATs traffic
at the time of connect(2) syscall, meaning, it rewrites the arguments to
a syscall while they're still in userspace, and before the syscall has a
chance to copy the argument into kernel space. These days we have better
mechanisms in BPF for achieving the same (e.g. for load-balancers), but
without having to write to userspace memory.

Of course the bpf_probe_write_user() helper can also be used to abuse
many other things for both good or bad purpose. Outside of BPF, there is
a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and
PTRACE_POKE{TEXT,DATA}, but would likely require some more effort.
Commit 96ae522795 explicitly dedicated the helper for experimentation
purpose only. Thus, move the helper's availability behind a newly added
LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under
the "integrity" mode. More fine-grained control can be implemented also
from LSM side with this change.

Fixes: 96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-10 10:10:10 +02:00
Daniel Borkmann 71330842ff bpf: Add _kernel suffix to internal lockdown_bpf_read
Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming
more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-09 21:50:41 +02:00
Linus Torvalds 757fa80f4e Tracing updates for 5.14:
- Added option for per CPU threads to the hwlat tracer
 
  - Have hwlat tracer handle hotplug CPUs
 
  - New tracer: osnoise, that detects latency caused by interrupts, softirqs
    and scheduling of other tasks.
 
  - Added timerlat tracer that creates a thread and measures in detail what
    sources of latency it has for wake ups.
 
  - Removed the "success" field of the sched_wakeup trace event.
    This has been hardcoded as "1" since 2015, no tooling should be looking
    at it now. If one exists, we can revert this commit, fix that tool and
    try to remove it again in the future.
 
  - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
 
  - New boot command line option "tp_printk_stop", as tp_printk causes trace
    events to write to console. When user space starts, this can easily live
    lock the system. Having a boot option to stop just after boot up is
    useful to prevent that from happening.
 
  - Have ftrace_dump_on_oops boot command line option take numbers that match
    the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
 
  - Bootconfig clean ups, fixes and enhancements.
 
  - New ktest script that tests bootconfig options.
 
  - Add tracepoint_probe_register_may_exist() to register a tracepoint
    without triggering a WARN*() if it already exists. BPF has a path from
    user space that can do this. All other paths are considered a bug.
 
  - Small clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYN8YPhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhxLAP9Mo5hHv7Hg6W7Ddv77rThm+qclsMR/
 yW0P+eJpMm4+xAD8Cq03oE1DimPK+9WZBKU5rSqAkqG6CjgDRw6NlIszzQQ=
 =WEPR
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Added option for per CPU threads to the hwlat tracer

 - Have hwlat tracer handle hotplug CPUs

 - New tracer: osnoise, that detects latency caused by interrupts,
   softirqs and scheduling of other tasks.

 - Added timerlat tracer that creates a thread and measures in detail
   what sources of latency it has for wake ups.

 - Removed the "success" field of the sched_wakeup trace event. This has
   been hardcoded as "1" since 2015, no tooling should be looking at it
   now. If one exists, we can revert this commit, fix that tool and try
   to remove it again in the future.

 - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.

 - New boot command line option "tp_printk_stop", as tp_printk causes
   trace events to write to console. When user space starts, this can
   easily live lock the system. Having a boot option to stop just after
   boot up is useful to prevent that from happening.

 - Have ftrace_dump_on_oops boot command line option take numbers that
   match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.

 - Bootconfig clean ups, fixes and enhancements.

 - New ktest script that tests bootconfig options.

 - Add tracepoint_probe_register_may_exist() to register a tracepoint
   without triggering a WARN*() if it already exists. BPF has a path
   from user space that can do this. All other paths are considered a
   bug.

 - Small clean ups and fixes

* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
  tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
  tracing: Simplify & fix saved_tgids logic
  treewide: Add missing semicolons to __assign_str uses
  tracing: Change variable type as bool for clean-up
  trace/timerlat: Fix indentation on timerlat_main()
  trace/osnoise: Make 'noise' variable s64 in run_osnoise()
  tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
  tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
  Documentation: Fix a typo on trace/osnoise-tracer
  trace/osnoise: Fix return value on osnoise_init_hotplug_support
  trace/osnoise: Make interval u64 on osnoise_main
  trace/osnoise: Fix 'no previous prototype' warnings
  tracing: Have osnoise_main() add a quiescent state for task rcu
  seq_buf: Make trace_seq_putmem_hex() support data longer than 8
  seq_buf: Fix overflow in seq_buf_putmem_hex()
  trace/osnoise: Support hotplug operations
  trace/hwlat: Support hotplug operations
  trace/hwlat: Protect kdata->kthread with get/put_online_cpus
  trace: Add timerlat tracer
  trace: Add osnoise tracer
  ...
2021-07-03 11:13:22 -07:00
Steven Rostedt (VMware) 9913d5745b tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
All internal use cases for tracepoint_probe_register() is set to not ever
be called with the same function and data. If it is, it is considered a
bug, as that means the accounting of handling tracepoints is corrupted.
If the function and data for a tracepoint is already registered when
tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
return with EEXISTS.

The BPF system call can end up calling tracepoint_probe_register() with
the same data, which now means that this can trigger the warning because
of a user space process. As WARN_ON_ONCE() should not be called because
user space called a system call with bad data, there needs to be a way to
register a tracepoint without triggering a warning.

Enter tracepoint_probe_register_may_exist(), which can be called, but will
not cause a WARN_ON() if the probe already exists. It will still error out
with EEXIST, which will then be sent to the user space that performed the
BPF system call.

This keeps the previous testing for issues with other users of the
tracepoint code, while letting BPF call it with duplicated data and not
warn about it.

Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/
Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153

Cc: stable@vger.kernel.org
Fixes: c4f6699dfc ("bpf: introduce BPF_RAW_TRACEPOINT")
Reported-by: syzbot <syzbot+721aa903751db87aa244@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-29 11:51:25 -04:00
Namhyung Kim 95b861a793 bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing
Allow the helper to be called from tracing programs. This is needed to
handle cgroup hiererachies in the program.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210627153627.824198-1-namhyung@kernel.org
2021-06-28 15:43:02 +02:00
Daniel Borkmann ff40e51043 bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
Commit 59438b4647 ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:

  1) The audit events that are triggered due to calls to security_locked_down()
     can OOM kill a machine, see below details [0].

  2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit()
     when trying to wake up kauditd, for example, when using trace_sched_switch()
     tracepoint, see details in [1]. Triggering this was not via some hypothetical
     corner case, but with existing tools like runqlat & runqslower from bcc, for
     example, which make use of this tracepoint. Rough call sequence goes like:

     rq_lock(rq) -> -------------------------+
       trace_sched_switch() ->               |
         bpf_prog_xyz() ->                   +-> deadlock
           selinux_lockdown() ->             |
             audit_log_end() ->              |
               wake_up_interruptible() ->    |
                 try_to_wake_up() ->         |
                   rq_lock(rq) --------------+

What's worse is that the intention of 59438b4647 to further restrict lockdown
settings for specific applications in respect to the global lockdown policy is
completely broken for BPF. The SELinux policy rule for the current lockdown check
looks something like this:

  allow <who> <who> : lockdown { <reason> };

However, this doesn't match with the 'current' task where the security_locked_down()
is executed, example: httpd does a syscall. There is a tracing program attached
to the syscall which triggers a BPF program to run, which ends up doing a
bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does
the permission check against 'current', that is, httpd in this example. httpd
has literally zero relation to this tracing program, and it would be nonsensical
having to write an SELinux policy rule against httpd to let the tracing helper
pass. The policy in this case needs to be against the entity that is installing
the BPF program. For example, if bpftrace would generate a histogram of syscall
counts by user space application:

  bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

bpftrace would then go and generate a BPF program from this internally. One way
of doing it [for the sake of the example] could be to call bpf_get_current_task()
helper and then access current->comm via one of bpf_probe_read_kernel{,_str}()
helpers. So the program itself has nothing to do with httpd or any other random
app doing a syscall here. The BPF program _explicitly initiated_ the lockdown
check. The allow/deny policy belongs in the context of bpftrace: meaning, you
want to grant bpftrace access to use these helpers, but other tracers on the
system like my_random_tracer _not_.

Therefore fix all three issues at the same time by taking a completely different
approach for the security_locked_down() hook, that is, move the check into the
program verification phase where we actually retrieve the BPF func proto. This
also reliably gets the task (current) that is trying to install the BPF tracing
program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since
we're moving this out of the BPF helper's fast-path which can be called several
millions of times per second.

The check is then also in line with other security_locked_down() hooks in the
system where the enforcement is performed at open/load time, for example,
open_kcore() for /proc/kcore access or module_sig_check() for module signatures
just to pick few random ones. What's out of scope in the fix as well as in
other security_locked_down() hook locations /outside/ of BPF subsystem is that
if the lockdown policy changes on the fly there is no retrospective action.
This requires a different discussion, potentially complex infrastructure, and
it's also not clear whether this can be solved generically. Either way, it is
out of scope for a suitable stable fix which this one is targeting. Note that
the breakage is specifically on 59438b4647 where it started to rely on 'current'
as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5cf ("bpf:
Restrict bpf when kernel lockdown is in confidentiality mode").

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:

  I starting seeing this with F-34. When I run a container that is traced with
  BPF to record the syscalls it is doing, auditd is flooded with messages like:

  type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality }
    for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
      scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0
        tclass=lockdown permissive=0

  This seems to be leading to auditd running out of space in the backlog buffer
  and eventually OOMs the machine.

  [...]
  auditd running at 99% CPU presumably processing all the messages, eventually I get:
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64
  Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64
  Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
  Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1
  Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
  [...]

[1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/,
    Serhei Makarov says:

  Upstream kernel 5.11.0-rc7 and later was found to deadlock during a
  bpf_probe_read_compat() call within a sched_switch tracepoint. The problem
  is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend
  testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on
  ppc64le. Example stack trace:

  [...]
  [  730.868702] stack backtrace:
  [  730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1
  [  730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  [  730.873278] Call Trace:
  [  730.873770]  dump_stack+0x7f/0xa1
  [  730.874433]  check_noncircular+0xdf/0x100
  [  730.875232]  __lock_acquire+0x1202/0x1e10
  [  730.876031]  ? __lock_acquire+0xfc0/0x1e10
  [  730.876844]  lock_acquire+0xc2/0x3a0
  [  730.877551]  ? __wake_up_common_lock+0x52/0x90
  [  730.878434]  ? lock_acquire+0xc2/0x3a0
  [  730.879186]  ? lock_is_held_type+0xa7/0x120
  [  730.880044]  ? skb_queue_tail+0x1b/0x50
  [  730.880800]  _raw_spin_lock_irqsave+0x4d/0x90
  [  730.881656]  ? __wake_up_common_lock+0x52/0x90
  [  730.882532]  __wake_up_common_lock+0x52/0x90
  [  730.883375]  audit_log_end+0x5b/0x100
  [  730.884104]  slow_avc_audit+0x69/0x90
  [  730.884836]  avc_has_perm+0x8b/0xb0
  [  730.885532]  selinux_lockdown+0xa5/0xd0
  [  730.886297]  security_locked_down+0x20/0x40
  [  730.887133]  bpf_probe_read_compat+0x66/0xd0
  [  730.887983]  bpf_prog_250599c5469ac7b5+0x10f/0x820
  [  730.888917]  trace_call_bpf+0xe9/0x240
  [  730.889672]  perf_trace_run_bpf_submit+0x4d/0xc0
  [  730.890579]  perf_trace_sched_switch+0x142/0x180
  [  730.891485]  ? __schedule+0x6d8/0xb20
  [  730.892209]  __schedule+0x6d8/0xb20
  [  730.892899]  schedule+0x5b/0xc0
  [  730.893522]  exit_to_user_mode_prepare+0x11d/0x240
  [  730.894457]  syscall_exit_to_user_mode+0x27/0x70
  [  730.895361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Fixes: 59438b4647 ("security,lockdown,selinux: implement SELinux lockdown")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: Jakub Hrozek <jhrozek@redhat.com>
Reported-by: Serhei Makarov <smakarov@redhat.com>
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Frank Eigler <fche@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net
2021-06-02 21:59:22 +02:00
Florent Revest 48cac3f4a9 bpf: Implement formatted output helpers with bstr_printf
BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf
and bpf_snprintf. Their signatures specify that all arguments are
provided from the BPF world as u64s (in an array or as registers). All
of these helpers are currently implemented by calling functions such as
snprintf() whose signatures take a variable number of arguments, then
placed in a va_list by the compiler to call vsnprintf().

"d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced
a bpf_printf_prepare function that fills an array of u64 sanitized
arguments with an array of "modifiers" which indicate what the "real"
size of each argument should be (given by the format specifier). The
BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to
its real size. However, the C promotion rules implicitely cast them all
back to u64s. Therefore, the arguments given to snprintf are u64s and
the va_list constructed by the compiler will use 64 bits for each
argument. On 64 bit machines, this happens to work well because 32 bit
arguments in va_lists need to occupy 64 bits anyway, but on 32 bit
architectures this breaks the layout of the va_list expected by the
called function and mangles values.

In "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem
had been solved for bpf_trace_printk only with a "horrid workaround"
that emitted multiple calls to trace_printk where each call had
different argument types and generated different va_list layouts. One of
the call would be dynamically chosen at runtime. This was ok with the 3
arguments that bpf_trace_printk takes but bpf_seq_printf and
bpf_snprintf accept up to 12 arguments. Because this approach scales
code exponentially, it is not a viable option anymore.

Because the promotion rules are part of the language and because the
construction of a va_list is an arch-specific ABI, it's best to just
avoid variadic arguments and va_lists altogether. Thankfully the
kernel's snprintf() has an alternative in the form of bstr_printf() that
accepts arguments in a "binary buffer representation". These binary
buffers are currently created by vbin_printf and used in the tracing
subsystem to split the cost of printing into two parts: a fast one that
only dereferences and remembers values, and a slower one, called later,
that does the pretty-printing.

This patch refactors bpf_printf_prepare to construct binary buffers of
arguments consumable by bstr_printf() instead of arrays of arguments and
modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the
bpf_printf_prepare usage but there are a few gotchas that change how
bpf_printf_prepare needs to do things.

Currently, bpf_printf_prepare uses a per cpu temporary buffer as a
generic storage for strings and IP addresses. With this refactoring, the
temporary buffers now holds all the arguments in a structured binary
format.

To comply with the format expected by bstr_printf, certain format
specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6.
Because vsnprintf subroutines for these specifiers are hard to expose,
we pre-format these arguments with calls to snprintf().

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
2021-04-27 15:56:31 -07:00
Florent Revest 38d26d89b3 bpf: Lock bpf_trace_printk's tmp buf before it is written to
bpf_trace_printk uses a shared static buffer to hold strings before they
are printed. A recent refactoring moved the locking of that buffer after
it gets filled by mistake.

Fixes: d9c9e4db18 ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427112958.773132-1-revest@chromium.org
2021-04-27 08:04:34 -07:00
Florent Revest 7b15523a98 bpf: Add a bpf_snprintf helper
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:

To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.

Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.

Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
2021-04-19 15:27:36 -07:00
Florent Revest d9c9e4db18 bpf: Factorize bpf_trace_printk and bpf_seq_printf
Two helpers (trace_printk and seq_printf) have very similar
implementations of format string parsing and a third one is coming
(snprintf). To avoid code duplication and make the code easier to
maintain, this moves the operations associated with format string
parsing (validation and argument sanitization) into one generic
function.

The implementation of the two existing helpers already drifted quite a
bit so unifying them entailed a lot of changes:

- bpf_trace_printk always expected fmt[fmt_size] to be the terminating
  NULL character, this is no longer true, the first 0 is terminating.
- bpf_trace_printk now supports %% (which produces the percentage char).
- bpf_trace_printk now skips width formating fields.
- bpf_trace_printk now supports the X modifier (capital hexadecimal).
- bpf_trace_printk now supports %pK, %px, %pB, %pi4, %pI4, %pi6 and %pI6
- argument casting on 32 bit has been simplified into one macro and
  using an enum instead of obscure int increments.

- bpf_seq_printf now uses bpf_trace_copy_string instead of
  strncpy_from_kernel_nofault and handles the %pks %pus specifiers.
- bpf_seq_printf now prints longs correctly on 32 bit architectures.

- both were changed to use a global per-cpu tmp buffer instead of one
  stack buffer for trace_printk and 6 small buffers for seq_printf.
- to avoid per-cpu buffer usage conflict, these helpers disable
  preemption while the per-cpu buffer is in use.
- both helpers now support the %ps and %pS specifiers to print symbols.

The implementation is also moved from bpf_trace.c to helpers.c because
the upcoming bpf_snprintf helper will be made available to all BPF
programs and will need it.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-2-revest@chromium.org
2021-04-19 15:27:36 -07:00
Yonghong Song 69c087ba62 bpf: Add bpf_for_each_map_elem() helper
The bpf_for_each_map_elem() helper is introduced which
iterates all map elements with a callback function. The
helper signature looks like
  long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
and for each map element, the callback_fn will be called. For example,
like hashmap, the callback signature may look like
  long callback_fn(map, key, val, callback_ctx)

There are two known use cases for this. One is from upstream ([1]) where
a for_each_map_elem helper may help implement a timeout mechanism
in a more generic way. Another is from our internal discussion
for a firewall use case where a map contains all the rules. The packet
data can be compared to all these rules to decide allow or deny
the packet.

For array maps, users can already use a bounded loop to traverse
elements. Using this helper can avoid using bounded loop. For other
type of maps (e.g., hash maps) where bounded loop is hard or
impossible to use, this helper provides a convenient way to
operate on all elements.

For callback_fn, besides map and map element, a callback_ctx,
allocated on caller stack, is also passed to the callback
function. This callback_ctx argument can provide additional
input and allow to write to caller stack for output.

If the callback_fn returns 0, the helper will iterate through next
element if available. If the callback_fn returns 1, the helper
will stop iterating and returns to the bpf program. Other return
values are not used for now.

Currently, this helper is only available with jit. It is possible
to make it work with interpreter with so effort but I leave it
as the future work.

[1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Song Liu a10787e6d5 bpf: Enable task local storage for tracing programs
To access per-task data, BPF programs usually creates a hash table with
pid as the key. This is not ideal because:
 1. The user need to estimate the proper size of the hash table, which may
    be inaccurate;
 2. Big hash tables are slow;
 3. To clean up the data properly during task terminations, the user need
    to write extra logic.

Task local storage overcomes these issues and offers a better option for
these per-task data. Task local storage is only available to BPF_LSM. Now
enable it for tracing programs.

Unlike LSM programs, tracing programs can be called in IRQ contexts.
Helpers that access task local storage are updated to use
raw_spin_lock_irqsave() instead of raw_spin_lock_bh().

Tracing programs can attach to functions on the task free path, e.g.
exit_creds(). To avoid allocating task local storage after
bpf_task_storage_free(). bpf_task_storage_get() is updated to not allocate
new storage when the task is not refcounted (task->usage == 0).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210225234319.336131-2-songliubraving@fb.com
2021-02-26 11:51:47 -08:00
David S. Miller b8af417e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc, &tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &zc, &len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 13:14:06 -08:00
Song Liu 3d06f34aa8 bpf: Allow bpf_d_path in bpf_iter program
task_file and task_vma iter programs have access to file->f_path. Enable
bpf_d_path to print paths of these file.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
2021-02-12 12:56:53 -08:00
Florent Revest c5dbb89fc2 bpf: Expose bpf_get_socket_cookie to tracing programs
This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
2021-02-11 17:44:41 -08:00
Alexei Starovoitov 548f1191d8 bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3
The commit 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
converted do_int3 handler to be "NMI-like".
That made old if (in_nmi()) check abort execution of bpf programs
attached to kprobe when kprobe is firing via int3
(For example when kprobe is placed in the middle of the function).
Remove the check to restore user visible behavior.

Fixes: 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20210203070636.70926-1-alexei.starovoitov@gmail.com
2021-02-03 15:54:22 +01:00
Linus Torvalds 09c0796adf Tracing updates for 5.11
The major update to this release is that there's a new arch config option called:
 CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it.
 All the ftrace callbacks now take a struct ftrace_regs instead of a struct
 pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then
 the ftrace_regs will have enough information to read the arguments of the
 function being traced, as well as access to the stack pointer. This way, if
 a user (like live kernel patching) only cares about the arguments, then it
 can avoid using the heavier weight "regs" callback, that puts in enough
 information in the struct ftrace_regs to simulate a breakpoint exception
 (needed for kprobes).
 
 New config option that audits the timestamps of the ftrace ring buffer at
 most every event recorded.  The "check_buffer()" calls will conflict with
 mainline, because I purposely added the check without including the fix that
 it caught, which is in mainline. Running a kernel built from the commit of
 the added check will trigger it.
 
 Ftrace recursion protection has been cleaned up to move the protection to
 the callback itself (this saves on an extra function call for those
 callbacks).
 
 Perf now handles its own RCU protection and does not depend on ftrace to do
 it for it (saving on that extra function call).
 
 New debug option to add "recursed_functions" file to tracefs that lists all
 the places that triggered the recursion protection of the function tracer.
 This will show where things need to be fixed as recursion slows down the
 function tracer.
 
 The eval enum mapping updates done at boot up are now offloaded to a work
 queue, as it caused a noticeable pause on slow embedded boards.
 
 Various clean ups and last minute fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9uq8xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtrwAQCHevqWMjKc1Q76bnCgwB0AbFKB6vqy
 5b6g/co5+ihv8wD/eJPWlZMAt97zTVW7bdp5qj/GTiCDbAsODMZ597LsxA0=
 =rZEz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major update to this release is that there's a new arch config
  option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.

  Currently, only x86_64 enables it. All the ftrace callbacks now take a
  struct ftrace_regs instead of a struct pt_regs. If the architecture
  has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
  have enough information to read the arguments of the function being
  traced, as well as access to the stack pointer.

  This way, if a user (like live kernel patching) only cares about the
  arguments, then it can avoid using the heavier weight "regs" callback,
  that puts in enough information in the struct ftrace_regs to simulate
  a breakpoint exception (needed for kprobes).

  A new config option that audits the timestamps of the ftrace ring
  buffer at most every event recorded.

  Ftrace recursion protection has been cleaned up to move the protection
  to the callback itself (this saves on an extra function call for those
  callbacks).

  Perf now handles its own RCU protection and does not depend on ftrace
  to do it for it (saving on that extra function call).

  New debug option to add "recursed_functions" file to tracefs that
  lists all the places that triggered the recursion protection of the
  function tracer. This will show where things need to be fixed as
  recursion slows down the function tracer.

  The eval enum mapping updates done at boot up are now offloaded to a
  work queue, as it caused a noticeable pause on slow embedded boards.

  Various clean ups and last minute fixes"

* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  tracing: Offload eval map updates to a work queue
  Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
  ring-buffer: Add rb_check_bpage in __rb_allocate_pages
  ring-buffer: Fix two typos in comments
  tracing: Drop unneeded assignment in ring_buffer_resize()
  tracing: Disable ftrace selftests when any tracer is running
  seq_buf: Avoid type mismatch for seq_buf_init
  ring-buffer: Fix a typo in function description
  ring-buffer: Remove obsolete rb_event_is_commit()
  ring-buffer: Add test to validate the time stamp deltas
  ftrace/documentation: Fix RST C code blocks
  tracing: Clean up after filter logic rewriting
  tracing: Remove the useless value assignment in test_create_synth_event()
  livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
  ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
  ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
  MAINTAINERS: assign ./fs/tracefs to TRACING
  tracing: Fix some typos in comments
  ftrace: Remove unused varible 'ret'
  ring-buffer: Add recording of ring buffer recursion into recursed_functions
  ...
2020-12-17 13:22:17 -08:00
Linus Torvalds d635a69dd4 Networking updates for 5.11
Core:
 
  - support "prefer busy polling" NAPI operation mode, where we defer softirq
    for some time expecting applications to periodically busy poll
 
  - AF_XDP: improve efficiency by more batching and hindering
            the adjacency cache prefetcher
 
  - af_packet: make packet_fanout.arr size configurable up to 64K
 
  - tcp: optimize TCP zero copy receive in presence of partial or unaligned
         reads making zero copy a performance win for much smaller messages
 
  - XDP: add bulk APIs for returning / freeing frames
 
  - sched: support fragmenting IP packets as they come out of conntrack
 
  - net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs
 
 BPF:
 
  - BPF switch from crude rlimit-based to memcg-based memory accounting
 
  - BPF type format information for kernel modules and related tracing
    enhancements
 
  - BPF implement task local storage for BPF LSM
 
  - allow the FENTRY/FEXIT/RAW_TP tracing programs to use bpf_sk_storage
 
 Protocols:
 
  - mptcp: improve multiple xmit streams support, memory accounting and
           many smaller improvements
 
  - TLS: support CHACHA20-POLY1305 cipher
 
  - seg6: add support for SRv6 End.DT4/DT6 behavior
 
  - sctp: Implement RFC 6951: UDP Encapsulation of SCTP
 
  - ppp_generic: add ability to bridge channels directly
 
  - bridge: Connectivity Fault Management (CFM) support as is defined in
            IEEE 802.1Q section 12.14.
 
 Drivers:
 
  - mlx5: make use of the new auxiliary bus to organize the driver internals
 
  - mlx5: more accurate port TX timestamping support
 
  - mlxsw:
    - improve the efficiency of offloaded next hop updates by using
      the new nexthop object API
    - support blackhole nexthops
    - support IEEE 802.1ad (Q-in-Q) bridging
 
  - rtw88: major bluetooth co-existance improvements
 
  - iwlwifi: support new 6 GHz frequency band
 
  - ath11k: Fast Initial Link Setup (FILS)
 
  - mt7915: dual band concurrent (DBDC) support
 
  - net: ipa: add basic support for IPA v4.5
 
 Refactor:
 
  - a few pieces of in_interrupt() cleanup work from Sebastian Andrzej Siewior
 
  - phy: add support for shared interrupts; get rid of multiple driver
         APIs and have the drivers write a full IRQ handler, slight growth
 	of driver code should be compensated by the simpler API which
 	also allows shared IRQs
 
  - add common code for handling netdev per-cpu counters
 
  - move TX packet re-allocation from Ethernet switch tag drivers to
    a central place
 
  - improve efficiency and rename nla_strlcpy
 
  - number of W=1 warning cleanups as we now catch those in a patchwork
    build bot
 
 Old code removal:
 
  - wan: delete the DLCI / SDLA drivers
 
  - wimax: move to staging
 
  - wifi: remove old WDS wifi bridging support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl/YXmUACgkQMUZtbf5S
 IrvSQBAAgOrt4EFopEvVqlTHZbqI45IEqgtXS+YWmlgnjZCgshyMj8q1yK1zzane
 qYxr/NNJ9kV3FdtaynmmHPgEEEfR5kJ/D3B2BsxYDkaDDrD0vbNsBGw+L+/Gbhxl
 N/5l/9FjLyLY1D+EErknuwR5XGuQ6BSDVaKQMhYOiK2hgdnAAI4hszo8Chf6wdD0
 XDBslQ7vpD/05r+eMj0IkS5dSAoGOIFXUxhJ5dqrDbRHiKsIyWqA3PLbYemfAhxI
 s2XckjfmSgGE3FKL8PSFu+EcfHbJQQjLcULJUnqgVcdwEEtRuE9ggEi52nZRXMWM
 4e8sQJAR9Fx7pZy0G1xfS149j6iPU5LjRlU9TNSpVABz14Vvvo3gEL6gyIdsz+xh
 hMN7UBdp0FEaP028CXoIYpaBesvQqj0BSndmee8qsYAtN6j+QKcM2AOSr7JN1uMH
 C/86EDoGAATiEQIVWJvnX5MPmlAoblyLA+RuVhmxkIBx2InGXkFmWqRkXT5l4jtk
 LVl8/TArR4alSQqLXictXCjYlCm9j5N4zFFtEVasSYi7/ZoPfgRNWT+lJ2R8Y+Zv
 +htzGaFuyj6RJTVeFQMrkl3whAtBamo2a0kwg45NnxmmXcspN6kJX1WOIy82+MhD
 Yht7uplSs7MGKA78q/CDU0XBeGjpABUvmplUQBIfrR/jKLW2730=
 =GXs1
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - support "prefer busy polling" NAPI operation mode, where we defer
     softirq for some time expecting applications to periodically busy
     poll

   - AF_XDP: improve efficiency by more batching and hindering the
     adjacency cache prefetcher

   - af_packet: make packet_fanout.arr size configurable up to 64K

   - tcp: optimize TCP zero copy receive in presence of partial or
     unaligned reads making zero copy a performance win for much smaller
     messages

   - XDP: add bulk APIs for returning / freeing frames

   - sched: support fragmenting IP packets as they come out of conntrack

   - net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs

  BPF:

   - BPF switch from crude rlimit-based to memcg-based memory accounting

   - BPF type format information for kernel modules and related tracing
     enhancements

   - BPF implement task local storage for BPF LSM

   - allow the FENTRY/FEXIT/RAW_TP tracing programs to use
     bpf_sk_storage

  Protocols:

   - mptcp: improve multiple xmit streams support, memory accounting and
     many smaller improvements

   - TLS: support CHACHA20-POLY1305 cipher

   - seg6: add support for SRv6 End.DT4/DT6 behavior

   - sctp: Implement RFC 6951: UDP Encapsulation of SCTP

   - ppp_generic: add ability to bridge channels directly

   - bridge: Connectivity Fault Management (CFM) support as is defined
     in IEEE 802.1Q section 12.14.

  Drivers:

   - mlx5: make use of the new auxiliary bus to organize the driver
     internals

   - mlx5: more accurate port TX timestamping support

   - mlxsw:
      - improve the efficiency of offloaded next hop updates by using
        the new nexthop object API
      - support blackhole nexthops
      - support IEEE 802.1ad (Q-in-Q) bridging

   - rtw88: major bluetooth co-existance improvements

   - iwlwifi: support new 6 GHz frequency band

   - ath11k: Fast Initial Link Setup (FILS)

   - mt7915: dual band concurrent (DBDC) support

   - net: ipa: add basic support for IPA v4.5

  Refactor:

   - a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
     Siewior

   - phy: add support for shared interrupts; get rid of multiple driver
     APIs and have the drivers write a full IRQ handler, slight growth
     of driver code should be compensated by the simpler API which also
     allows shared IRQs

   - add common code for handling netdev per-cpu counters

   - move TX packet re-allocation from Ethernet switch tag drivers to a
     central place

   - improve efficiency and rename nla_strlcpy

   - number of W=1 warning cleanups as we now catch those in a patchwork
     build bot

  Old code removal:

   - wan: delete the DLCI / SDLA drivers

   - wimax: move to staging

   - wifi: remove old WDS wifi bridging support"

* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
  net: hns3: fix expression that is currently always true
  net: fix proc_fs init handling in af_packet and tls
  nfc: pn533: convert comma to semicolon
  af_vsock: Assign the vsock transport considering the vsock address flags
  af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
  vsock_addr: Check for supported flag values
  vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
  vm_sockets: Add flags field in the vsock address data structure
  net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
  tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
  net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
  nfc: s3fwrn5: Release the nfc firmware
  net: vxget: clean up sparse warnings
  mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
  mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
  mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
  mlxsw: reg: Add Router LPM Cache Enable Register
  mlxsw: reg: Add Router LPM Cache ML Delete Register
  mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
  mlxsw: reg: Add XM Router M Table Register
  ...
2020-12-15 13:22:29 -08:00
Linus Torvalds adb35e8dc9 Scheduler updates:
- migrate_disable/enable() support which originates from the RT tree and
    is now a prerequisite for the new preemptible kmap_local() API which aims
    to replace kmap_atomic().
 
  - A fair amount of topology and NUMA related improvements
 
  - Improvements for the frequency invariant calculations
 
  - Enhanced robustness for the global CPU priority tracking and decision
    making
 
  - The usual small fixes and enhancements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
 Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
 /QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
 3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
 sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
 dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
 u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
 Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
 CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
 MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
 e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
 Gsq0rJW5iiu/OQ==
 =Oz1V
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - migrate_disable/enable() support which originates from the RT tree
   and is now a prerequisite for the new preemptible kmap_local() API
   which aims to replace kmap_atomic().

 - A fair amount of topology and NUMA related improvements

 - Improvements for the frequency invariant calculations

 - Enhanced robustness for the global CPU priority tracking and decision
   making

 - The usual small fixes and enhancements all over the place

* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
  sched/fair: Trivial correction of the newidle_balance() comment
  sched/fair: Clear SMT siblings after determining the core is not idle
  sched: Fix kernel-doc markup
  x86: Print ratio freq_max/freq_base used in frequency invariance calculations
  x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
  x86, sched: Calculate frequency invariance for AMD systems
  irq_work: Optimize irq_work_single()
  smp: Cleanup smp_call_function*()
  irq_work: Cleanup
  sched: Limit the amount of NUMA imbalance that can exist at fork time
  sched/numa: Allow a floating imbalance between NUMA nodes
  sched: Avoid unnecessary calculation of load imbalance at clone time
  sched/numa: Rename nr_running and break out the magic number
  sched: Make migrate_disable/enable() independent of RT
  sched/topology: Condition EAS enablement on FIE support
  arm64: Rebuild sched domains on invariance status changes
  sched/topology,schedutil: Wrap sched domains rebuild
  sched/uclamp: Allow to reset a task uclamp constraint value
  sched/core: Fix typos in comments
  Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
  ...
2020-12-14 18:29:11 -08:00