Commit Graph

596 Commits

Author SHA1 Message Date
Jerome Marchand e7d2b19956 bpf: Check validity of link->type in bpf_link_show_fdinfo()
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 8421d4c8762bd022cb491f2f0f7019ef51b4f0a7
Author: Hou Tao <houtao1@huawei.com>
Date:   Thu Oct 24 09:35:58 2024 +0800

    bpf: Check validity of link->type in bpf_link_show_fdinfo()

    If a newly-added link type doesn't invoke BPF_LINK_TYPE(), accessing
    bpf_link_type_strs[link->type] may result in an out-of-bounds access.

    To spot such missed invocations early in the future, checking the
    validity of link->type in bpf_link_show_fdinfo() and emitting a warning
    when such invocations are missed.

    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20241024013558.1135167-3-houtao@huaweicloud.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:09 +01:00
Jerome Marchand 4275e2f620 bpf: Add MEM_WRITE attribute
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 6fad274f06f038c29660aa53fbad14241c9fd976
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon Oct 21 17:28:05 2024 +0200

    bpf: Add MEM_WRITE attribute

    Add a MEM_WRITE attribute for BPF helper functions which can be used in
    bpf_func_proto to annotate an argument type in order to let the verifier
    know that the helper writes into the memory passed as an argument. In
    the past MEM_UNINIT has been (ab)used for this function, but the latter
    merely tells the verifier that the passed memory can be uninitialized.

    There have been bugs with overloading the latter but aside from that
    there are also cases where the passed memory is read + written which
    currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former
    ARG_PTR_TO_{LONG,INT} args in case of error").

    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:08 +01:00
Jerome Marchand 8659d43485 bpf: fix unpopulated name_len field in perf_event link info
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 4deecdd29cf29844c7bd164d72dc38d2e672f64e
Author: Tyrone Wu <wudevelops@gmail.com>
Date:   Tue Oct 8 16:43:11 2024 +0000

    bpf: fix unpopulated name_len field in perf_event link info

    Previously when retrieving `bpf_link_info.perf_event` for
    kprobe/uprobe/tracepoint, the `name_len` field was not populated by the
    kernel, leaving it to reflect the value initially set by the user. This
    behavior was inconsistent with how other input/output string buffer
    fields function (e.g. `raw_tracepoint.tp_name_len`).

    This patch fills `name_len` with the actual size of the string name.

    Fixes: 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event")
    Signed-off-by: Tyrone Wu <wudevelops@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20241008164312.46269-1-wudevelops@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:07 +01:00
Jerome Marchand 78fd7525b7 bpf: Call the missed btf_record_free() when map creation fails
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 87e9675a0dfd0bf4a36550e4a0e673038ec67aee
Author: Hou Tao <houtao1@huawei.com>
Date:   Thu Sep 12 09:28:44 2024 +0800

    bpf: Call the missed btf_record_free() when map creation fails

    When security_bpf_map_create() in map_create() fails, map_create() will
    call btf_put() and ->map_free() callback to free the map. It doesn't
    free the btf_record of map value, so add the missed btf_record_free()
    when map creation fails.

    However btf_record_free() needs to be called after ->map_free() just
    like bpf_map_free_deferred() did, because ->map_free() may use the
    btf_record to free the special fields in preallocated map value. So
    factor out bpf_map_free() helper to free the map, btf_record, and btf
    orderly and use the helper in both map_create() and
    bpf_map_free_deferred().

    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20240912012845.3458483-2-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:07 +01:00
Jerome Marchand 2419f41261 bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 4b3786a6c5397dc220b1483d8e2f4867743e966f
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Fri Sep 13 21:17:50 2024 +0200

    bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error

    For all non-tracing helpers which formerly had ARG_PTR_TO_{LONG,INT} as input
    arguments, zero the value for the case of an error as otherwise it could leak
    memory. For tracing, it is not needed given CAP_PERFMON can already read all
    kernel memory anyway hence bpf_get_func_arg() and bpf_get_func_ret() is skipped
    in here.

    Also, the MTU helpers mtu_len pointer value is being written but also read.
    Technically, the MEM_UNINIT should not be there in order to always force init.
    Removing MEM_UNINIT needs more verifier rework though: MEM_UNINIT right now
    implies two things actually: i) write into memory, ii) memory does not have
    to be initialized. If we lift MEM_UNINIT, it then becomes: i) read into memory,
    ii) memory must be initialized. This means that for bpf_*_check_mtu() we're
    readding the issue we're trying to fix, that is, it would then be able to
    write back into things like .rodata BPF maps. Follow-up work will rework the
    MEM_UNINIT semantics such that the intent can be better expressed. For now
    just clear the *mtu_len on error path which can be lifted later again.

    Fixes: 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types")
    Fixes: d7a4cb9b67 ("bpf: Introduce bpf_strtol and bpf_strtoul helpers")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/e5edd241-59e7-5e39-0ee5-a51e31b6840a@iogearbox.net
    Link: https://lore.kernel.org/r/20240913191754.13290-5-daniel@iogearbox.net
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:06 +01:00
Jerome Marchand e12894e8b8 bpf: Fix helper writes to read-only maps
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 32556ce93bc45c730829083cb60f95a2728ea48b
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Fri Sep 13 21:17:48 2024 +0200

    bpf: Fix helper writes to read-only maps

    Lonial found an issue that despite user- and BPF-side frozen BPF map
    (like in case of .rodata), it was still possible to write into it from
    a BPF program side through specific helpers having ARG_PTR_TO_{LONG,INT}
    as arguments.

    In check_func_arg() when the argument is as mentioned, the meta->raw_mode
    is never set. Later, check_helper_mem_access(), under the case of
    PTR_TO_MAP_VALUE as register base type, it assumes BPF_READ for the
    subsequent call to check_map_access_type() and given the BPF map is
    read-only it succeeds.

    The helpers really need to be annotated as ARG_PTR_TO_{LONG,INT} | MEM_UNINIT
    when results are written into them as opposed to read out of them. The
    latter indicates that it's okay to pass a pointer to uninitialized memory
    as the memory is written to anyway.

    However, ARG_PTR_TO_{LONG,INT} is a special case of ARG_PTR_TO_FIXED_SIZE_MEM
    just with additional alignment requirement. So it is better to just get
    rid of the ARG_PTR_TO_{LONG,INT} special cases altogether and reuse the
    fixed size memory types. For this, add MEM_ALIGNED to additionally ensure
    alignment given these helpers write directly into the args via *<ptr> = val.
    The .arg*_size has been initialized reflecting the actual sizeof(*<ptr>).

    MEM_ALIGNED can only be used in combination with MEM_FIXED_SIZE annotated
    argument types, since in !MEM_FIXED_SIZE cases the verifier does not know
    the buffer size a priori and therefore cannot blindly write *<ptr> = val.

    Fixes: 57c3bb725a ("bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types")
    Reported-by: Lonial Con <kongln9170@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
    Link: https://lore.kernel.org/r/20240913191754.13290-3-daniel@iogearbox.net
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:06 +01:00
Jerome Marchand c60ecac934 bpf: change int cmd argument in __sys_bpf into typed enum bpf_cmd
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit 2db2b8cb8f96bb1def9904abbc859d95e3fbf99c
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Sep 5 14:05:20 2024 -0700

    bpf: change int cmd argument in __sys_bpf into typed enum bpf_cmd

    This improves BTF data recorded about this function and makes
    debugging/tracing better, because now command can be displayed as
    symbolic name, instead of obscure number.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20240905210520.2252984-1-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:04 +01:00
Jerome Marchand dfaf1f971e bpf: Let callers of btf_parse_kptr() track life cycle of prog btf
JIRA: https://issues.redhat.com/browse/RHEL-63880

commit c5ef53420f46c9ca6badca4f4cabacd76de8091e
Author: Amery Hung <amery.hung@bytedance.com>
Date:   Tue Aug 13 21:24:20 2024 +0000

    bpf: Let callers of btf_parse_kptr() track life cycle of prog btf

    btf_parse_kptr() and btf_record_free() do btf_get() and btf_put()
    respectively when working on btf_record in program and map if there are
    kptr fields. If the kptr is from program BTF, since both callers has
    already tracked the life cycle of program BTF, it is safe to remove the
    btf_get() and btf_put().

    This change prevents memory leak of program BTF later when we start
    searching for kptr fields when building btf_record for program. It can
    happen when the btf fd is closed. The btf_put() corresponding to the
    btf_get() in btf_parse_kptr() was supposed to be called by
    btf_record_free() in btf_free_struct_meta_tab() in btf_free(). However,
    it will never happen since the invocation of btf_free() depends on the
    refcount of the btf to become 0 in the first place.

    Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
    Acked-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Amery Hung <amery.hung@bytedance.com>
    Link: https://lore.kernel.org/r/20240813212424.2871455-2-amery.hung@bytedance.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2025-01-21 11:27:00 +01:00
Viktor Malik 902af8c933
bpf: export bpf_link_inc_not_zero.
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit 67c3e8353f45c27800eecc46e00e8272f063f7d1
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed May 29 23:59:42 2024 -0700

    bpf: export bpf_link_inc_not_zero.
    
    bpf_link_inc_not_zero() will be used by kernel modules.  We will use it in
    bpf_testmod.c later.
    
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240530065946.979330-5-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:01 +01:00
Viktor Malik bb34399157
bpf: support epoll from bpf struct_ops links.
JIRA: https://issues.redhat.com/browse/RHEL-30774

commit 1adddc97aa44c8783f9f0276ea70854d56f9f6df
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Wed May 29 23:59:41 2024 -0700

    bpf: support epoll from bpf struct_ops links.
    
    Add epoll support to bpf struct_ops links to trigger EPOLLHUP event upon
    detachment.
    
    This patch implements the "poll" of the "struct file_operations" for BPF
    links and introduces a new "poll" operator in the "struct bpf_link_ops". By
    implementing "poll" of "struct bpf_link_ops" for the links of struct_ops,
    the file descriptor of a struct_ops link can be added to an epoll file
    descriptor to receive EPOLLHUP events.
    
    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240530065946.979330-4-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-26 14:40:01 +01:00
Viktor Malik f508483e5d
bpf: Add BPF_PROG_TYPE_CGROUP_SKB attach type enforcement in BPF_LINK_CREATE
JIRA: https://issues.redhat.com/browse/RHEL-30773
JIRA: https://issues.redhat.com/browse/RHEL-64874
CVE: CVE-2024-38564

commit 543576ec15b17c0c93301ac8297333c7b6e84ac7
Author: Stanislav Fomichev <sdf@google.com>
Date:   Fri Apr 26 16:16:18 2024 -0700

    bpf: Add BPF_PROG_TYPE_CGROUP_SKB attach type enforcement in BPF_LINK_CREATE

    bpf_prog_attach uses attach_type_to_prog_type to enforce proper
    attach type for BPF_PROG_TYPE_CGROUP_SKB. link_create uses
    bpf_prog_get and relies on bpf_prog_attach_check_attach_type
    to properly verify prog_type <> attach_type association.

    Add missing attach_type enforcement for the link_create case.
    Otherwise, it's currently possible to attach cgroup_skb prog
    types to other cgroup hooks.

    Fixes: af6eea5743 ("bpf: Implement bpf_link-based cgroup BPF program attachment")
    Link: https://lore.kernel.org/bpf/0000000000004792a90615a1dde0@google.com/
    Reported-by: syzbot+838346b979830606c854@syzkaller.appspotmail.com
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/r/20240426231621.2716876-2-sdf@google.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-13 09:38:45 +01:00
Viktor Malik 2fe8fa12a2
bpf: Add support for kprobe session attach
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 535a3692ba7245792e6f23654507865d4293c850
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Tue Apr 30 13:28:24 2024 +0200

    bpf: Add support for kprobe session attach
    
    Adding support to attach bpf program for entry and return probe
    of the same function. This is common use case which at the moment
    requires to create two kprobe multi links.
    
    Adding new BPF_TRACE_KPROBE_SESSION attach type that instructs
    kernel to attach single link program to both entry and exit probe.
    
    It's possible to control execution of the bpf program on return
    probe simply by returning zero or non zero from the entry bpf
    program execution to execute or not the bpf program on return
    probe respectively.
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20240430112830.1184228-2-jolsa@kernel.org

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:56 +01:00
Viktor Malik 1969d68a5e
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 246331e3f1eac905170a923f0ec76725c2558232
Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Sat Apr 20 11:09:09 2024 +0200

    bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
    
    Currently bpf_wq_cancel_and_free() is just a placeholder as there is
    no memory allocation for bpf_wq just yet.
    
    Again, duplication of the bpf_timer approach
    
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-9-6c986a5a741f@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:50 +01:00
Viktor Malik 109d9b01f8
bpf: add support for bpf_wq user type
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit d56b63cf0c0f71e1b2e04dd8220b408f049e67ff
Author: Benjamin Tissoires <bentiss@kernel.org>
Date:   Sat Apr 20 11:09:05 2024 +0200

    bpf: add support for bpf_wq user type
    
    Mostly a copy/paste from the bpf_timer API, without the initialization
    and free, as they will be done in a separate patch.
    
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-5-6c986a5a741f@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-11 07:44:49 +01:00
Viktor Malik 6cd5a45b58
bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit 68ca5d4eebb8c4de246ee5f634eee26bc689562d
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Mar 19 16:38:50 2024 -0700

    bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
    
    Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
    aware variants). This brings them up to part w.r.t. BPF cookie usage
    with classic tracepoint and fentry/fexit programs.
    
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:31 +01:00
Viktor Malik c1ad04e563
bpf: pass whole link instead of prog when triggering raw tracepoint
JIRA: https://issues.redhat.com/browse/RHEL-30773

commit d4dfc5700e867b22ab94f960f9a9972696a637d5
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Mar 19 16:38:49 2024 -0700

    bpf: pass whole link instead of prog when triggering raw tracepoint
    
    Instead of passing prog as an argument to bpf_trace_runX() helpers, that
    are called from tracepoint triggering calls, store BPF link itself
    (struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
    extra information like BPF cookie into raw tracepoint registration.
    
    Instead of replacing `struct bpf_prog *prog = __data;` with
    corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
    `__bpf_trace_##call` I just passed `__data` through into underlying
    bpf_trace_runX() call. This works well because we implicitly cast `void *`,
    and it also avoids naming clashes with arguments coming from
    tracepoint's "proto" list. We could have run into the same problem with
    "prog", we just happened to not have a tracepoint that has "prog" input
    argument. We are less lucky with "link", as there are tracepoints using
    "link" argument name already. So instead of trying to avoid naming
    conflicts, let's just remove intermediate local variable. It doesn't
    hurt readibility, it's either way a bit of a maze of calls and macros,
    that requires careful reading.
    
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Message-ID: <20240319233852.1977493-3-andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-11-07 13:58:31 +01:00
Jerome Marchand 8a75de1c4a bpf: Fix a potential use-after-free in bpf_link_free()
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 2884dc7d08d98a89d8d65121524bb7533183a63a
Author: Cong Wang <cong.wang@bytedance.com>
Date:   Sun Jun 2 11:27:03 2024 -0700

    bpf: Fix a potential use-after-free in bpf_link_free()

    After commit 1a80dbcb2dba, bpf_link can be freed by
    link->ops->dealloc_deferred, but the code still tests and uses
    link->ops->dealloc afterward, which leads to a use-after-free as
    reported by syzbot. Actually, one of them should be sufficient, so
    just call one of them instead of both. Also add a WARN_ON() in case
    of any problematic implementation.

    Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period")
    Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:18 +02:00
Jerome Marchand d340b4ed1c bpf: support deferring bpf_link dealloc to after RCU grace period
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Wed Mar 27 22:24:26 2024 -0700

    bpf: support deferring bpf_link dealloc to after RCU grace period

    BPF link for some program types is passed as a "context" which can be
    used by those BPF programs to look up additional information. E.g., for
    multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.

    Because of this runtime dependency, when bpf_link refcnt drops to zero
    there could still be active BPF programs running accessing link data.

    This patch adds generic support to defer bpf_link dealloc callback to
    after RCU GP, if requested. This is done by exposing two different
    deallocation callbacks, one synchronous and one deferred. If deferred
    one is provided, bpf_link_free() will schedule dealloc_deferred()
    callback to happen after RCU GP.

    BPF is using two flavors of RCU: "classic" non-sleepable one and RCU
    tasks trace one. The latter is used when sleepable BPF programs are
    used. bpf_link_free() accommodates that by checking underlying BPF
    program's sleepable flag, and goes either through normal RCU GP only for
    non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP
    (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF
    program is sleepable.

    We use this for multi-kprobe and multi-uprobe links, which dereference
    link during program run. We also preventively switch raw_tp link to use
    deferred dealloc callback, as upcoming changes in bpf-next tree expose
    raw_tp link data (specifically, cookie value) to BPF program at runtime
    as well.

    Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
    Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
    Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com
    Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com
    Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:17 +02:00
Jerome Marchand 0803f1f621 bpf: move sleepable flag from bpf_prog_aux to bpf_prog
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 66c8473135c62f478301a0e5b3012f203562dfa6
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Mar 8 16:47:39 2024 -0800

    bpf: move sleepable flag from bpf_prog_aux to bpf_prog

    prog->aux->sleepable is checked very frequently as part of (some) BPF
    program run hot paths. So this extra aux indirection seems wasteful and
    on busy systems might cause unnecessary memory cache misses.

    Let's move sleepable flag into prog itself to eliminate unnecessary
    pointer dereference.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Message-ID: <20240309004739.2961431-1-andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:16 +02:00
Jerome Marchand ead0c14432 bpf: Recognize addr_space_cast instruction in the verifier.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 6082b6c328b5486da2b356eae94b8b83c98b5565
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Thu Mar 7 17:08:03 2024 -0800

    bpf: Recognize addr_space_cast instruction in the verifier.

    rY = addr_space_cast(rX, 0, 1) tells the verifier that rY->type = PTR_TO_ARENA.
    Any further operations on PTR_TO_ARENA register have to be in 32-bit domain.

    The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32.
    JIT will generate them as kern_vm_start + 32bit_addr memory accesses.

    rY = addr_space_cast(rX, 1, 0) tells the verifier that rY->type = unknown scalar.
    If arena->map_flags has BPF_F_NO_USER_CONV set then convert cast_user to mov32 as well.
    Otherwise JIT will convert it to:
      rY = (u32)rX;
      if (rY)
         rY |= arena->user_vm_start & ~(u64)~0U;

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20240308010812.89848-6-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:15 +02:00
Jerome Marchand a245935eb8 bpf: Introduce bpf_arena.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 317460317a02a1af512697e6e964298dedd8a163
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Thu Mar 7 17:07:59 2024 -0800

    bpf: Introduce bpf_arena.

    Introduce bpf_arena, which is a sparse shared memory region between the bpf
    program and user space.

    Use cases:
    1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
       anonymous region, like memcached or any key/value storage. The bpf
       program implements an in-kernel accelerator. XDP prog can search for
       a key in bpf_arena and return a value without going to user space.
    2. The bpf program builds arbitrary data structures in bpf_arena (hash
       tables, rb-trees, sparse arrays), while user space consumes it.
    3. bpf_arena is a "heap" of memory from the bpf program's point of view.
       The user space may mmap it, but bpf program will not convert pointers
       to user base at run-time to improve bpf program speed.

    Initially, the kernel vm_area and user vma are not populated. User space
    can fault in pages within the range. While servicing a page fault,
    bpf_arena logic will insert a new page into the kernel and user vmas. The
    bpf program can allocate pages from that region via
    bpf_arena_alloc_pages(). This kernel function will insert pages into the
    kernel vm_area. The subsequent fault-in from user space will populate that
    page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
    can be used to prevent fault-in from user space. In such a case, if a page
    is not allocated by the bpf program and not present in the kernel vm_area,
    the user process will segfault. This is useful for use cases 2 and 3 above.

    bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
    either at a specific address within the arena or allocates a range with the
    maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
    pages and removes the range from the kernel vm_area and from user process
    vmas.

    bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
    bpf program is more important than ease of sharing with user space. This is
    use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
    It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
    instruction as a 32-bit move wX = wY, which will improve bpf prog
    performance. Otherwise, bpf_arena_cast_user is translated by JIT to
    conditionally add the upper 32 bits of user vm_start (if the pointer is not
    NULL) to arena pointers before they are stored into memory. This way, user
    space sees them as valid 64-bit pointers.

    Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
    backend generate the bpf_addr_space_cast() instruction to cast pointers
    between address_space(1) which is reserved for bpf_arena pointers and
    default address space zero. All arena pointers in a bpf program written in
    C language are tagged as __attribute__((address_space(1))). Hence, clang
    provides helpful diagnostics when pointers cross address space. Libbpf and
    the kernel support only address_space == 1. All other address space
    identifiers are reserved.

    rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
    verifier that rX->type = PTR_TO_ARENA. Any further operations on
    PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
    mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
    them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
    to copy_from_kernel_nofault() except that no address checks are necessary.
    The address is guaranteed to be in the 4GB range. If the page is not
    present, the destination register is zeroed on read, and the operation is
    ignored on write.

    rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
    unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
    verifier converts such cast instructions to mov32. Otherwise, JIT will emit
    native code equivalent to:
    rX = (u32)rY;
    if (rY)
      rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

    After such conversion, the pointer becomes a valid user pointer within
    bpf_arena range. The user process can access data structures created in
    bpf_arena without any additional computations. For example, a linked list
    built by a bpf program can be walked natively by user space.

    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Barret Rhoden <brho@google.com>
    Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:15 +02:00
Jerome Marchand 5a03a0c5ce bpf: Plumb get_unmapped_area() callback into bpf_map_ops
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit cf2c2e4a3d910270903d50462aaa75140cdb2c96
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Wed Mar 6 19:12:25 2024 -0800

    bpf: Plumb get_unmapped_area() callback into bpf_map_ops

    Subsequent patches introduce bpf_arena that imposes special alignment
    requirements on address selection.

    Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/r/20240307031228.42896-4-alexei.starovoitov@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:14 +02:00
Jerome Marchand 51a5593860 bpf,lsm: Refactor bpf_map_alloc/bpf_map_free LSM hooks
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit a2431c7eabcf9bd5a1e7a1f7ecded40fdda4a8c5
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:07 2024 -0800

    bpf,lsm: Refactor bpf_map_alloc/bpf_map_free LSM hooks

    Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
    hook into bpf_map_create, taking not just struct bpf_map, but also
    bpf_attr and bpf_token, to give a fuller context to LSMs.

    Unlike bpf_prog_alloc, there is no need to move the hook around, as it
    currently is firing right before allocating BPF map ID and FD, which
    seems to be a sweet spot.

    But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
    LSM hook is called even if bpf_map_create hook returned error, as if few
    LSMs are combined together it could be that one LSM successfully
    allocated security blob for its needs, while subsequent LSM rejected BPF
    map creation. The former LSM would still need to free up LSM blob, so we
    need to ensure security_bpf_map_free() is called regardless of the
    outcome.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-11-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand 8b98ff8bb2 bpf,lsm: Refactor bpf_prog_alloc/bpf_prog_free LSM hooks
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 1b67772e4e3f16cd647b229cae95fc06d120be08
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:06 2024 -0800

    bpf,lsm: Refactor bpf_prog_alloc/bpf_prog_free LSM hooks

    Based on upstream discussion ([0]), rework existing
    bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
    of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
    program struct. Also, we pass bpf_attr union with all the user-provided
    arguments for BPF_PROG_LOAD command.  This will give LSMs as much
    information as we can basically provide.

    The hook is also BPF token-aware now, and optional bpf_token struct is
    passed as a third argument. bpf_prog_load LSM hook is called after
    a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
    allocated and filled out, but right before performing full-fledged BPF
    verification step.

    bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
    consistency. SELinux code is adjusted to all new names, types, and
    signatures.

    Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
    used by some LSMs to allocate extra security blob, but also by other
    LSMs to reject BPF program loading, we need to make sure that
    bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
    *even* if the hook itself returned error. If we don't do that, we run
    the risk of leaking memory. This seems to be possible today when
    combining SELinux and BPF LSM, as one example, depending on their
    relative ordering.

    Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
    sleepable LSM hooks list, as they are both executed in sleepable
    context. Also drop bpf_prog_load hook from untrusted, as there is no
    issue with refcount or anything else anymore, that originally forced us
    to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
    hook arguments as trusted"). We now trigger this hook much later and it
    should not be an issue anymore.

      [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-10-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand d1c16d1138 bpf: Take into account BPF token when fetching helper protos
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts: Context change due to missing commit 9a675ba55a96 ("net,
bpf: Add a warning if NAPI cb missed xdp_do_flush().")

commit bbc1d24724e110b86a1a7c3c1724ce0d62cc1e2e
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:04 2024 -0800

    bpf: Take into account BPF token when fetching helper protos

    Instead of performing unconditional system-wide bpf_capable() and
    perfmon_capable() calls inside bpf_base_func_proto() function (and other
    similar ones) to determine eligibility of a given BPF helper for a given
    program, use previously recorded BPF token during BPF_PROG_LOAD command
    handling to inform the decision.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-8-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand 13b9927298 bpf: Add BPF token support to BPF_PROG_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit caf8f28e036c4ba1e823355da6c0c01c39e70ab9
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:03 2024 -0800

    bpf: Add BPF token support to BPF_PROG_LOAD command

    Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
    should be set in prog_flags field when providing prog_token_fd.

    Wire through a set of allowed BPF program types and attach types,
    derived from BPF FS at BPF token creation time. Then make sure we
    perform bpf_token_capable() checks everywhere where it's relevant.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand c6470fbf17 bpf: Add BPF token support to BPF_BTF_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 9ea7c4bf17e39d463eb4782f948f401d9764b1b3
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:02 2024 -0800

    bpf: Add BPF token support to BPF_BTF_LOAD command

    Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
    through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
    when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
    field before, we also add btf_flags field.

    BTF loading is a pretty straightforward operation, so as long as BPF
    token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
    proceeds to parsing BTF data and creating BTF object.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand cb1e5415cf bpf: Add BPF token support to BPF_MAP_CREATE command
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit a177fc2bf6fd83704854feaf7aae926b1df4f0b9
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:01 2024 -0800

    bpf: Add BPF token support to BPF_MAP_CREATE command

    Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
    BPF map creation from unprivileged process through delegated BPF token.
    New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
    for BPF_MAP_CREATE command.

    Wire through a set of allowed BPF map types to BPF token, derived from
    BPF FS at BPF token creation time. This, in combination with allowed_cmds
    allows to create a narrowly-focused BPF token (controlled by privileged
    agent) with a restrictive set of BPF maps that application can attempt
    to create.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand a761731cac bpf: Introduce BPF token object
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 35f96de04127d332a5c5e8a155d31f452f88c76d
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:21:00 2024 -0800

    bpf: Introduce BPF token object

    Add new kind of BPF kernel object, BPF token. BPF token is meant to
    allow delegating privileged BPF functionality, like loading a BPF
    program or creating a BPF map, from privileged process to a *trusted*
    unprivileged process, all while having a good amount of control over which
    privileged operations could be performed using provided BPF token.

    This is achieved through mounting BPF FS instance with extra delegation
    mount options, which determine what operations are delegatable, and also
    constraining it to the owning user namespace (as mentioned in the
    previous patch).

    BPF token itself is just a derivative from BPF FS and can be created
    through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
    FS FD, which can be attained through open() API by opening BPF FS mount
    point. Currently, BPF token "inherits" delegated command, map types,
    prog type, and attach type bit sets from BPF FS as is. In the future,
    having an BPF token as a separate object with its own FD, we can allow
    to further restrict BPF token's allowable set of things either at the
    creation time or after the fact, allowing the process to guard itself
    further from unintentionally trying to load undesired kind of BPF
    programs. But for now we keep things simple and just copy bit sets as is.

    When BPF token is created from BPF FS mount, we take reference to the
    BPF super block's owning user namespace, and then use that namespace for
    checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
    capabilities that are normally only checked against init userns (using
    capable()), but now we check them using ns_capable() instead (if BPF
    token is provided). See bpf_token_capable() for details.

    Such setup means that BPF token in itself is not sufficient to grant BPF
    functionality. User namespaced process has to *also* have necessary
    combination of capabilities inside that user namespace. So while
    previously CAP_BPF was useless when granted within user namespace, now
    it gains a meaning and allows container managers and sys admins to have
    a flexible control over which processes can and need to use BPF
    functionality within the user namespace (i.e., container in practice).
    And BPF FS delegation mount options and derived BPF tokens serve as
    a per-container "flag" to grant overall ability to use bpf() (plus further
    restrict on which parts of bpf() syscalls are treated as namespaced).

    Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
    within the BPF FS owning user namespace, rounding up the ns_capable()
    story of BPF token. Also creating BPF token in init user namespace is
    currently not supported, given BPF token doesn't have any effect in init
    user namespace anyways.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Christian Brauner <brauner@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:03 +02:00
Jerome Marchand d48ee3f4e9 bpf: Align CAP_NET_ADMIN checks with bpf_capable() approach
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit ed1ad5a7415de8be121055e7ab1303d2be5407e0
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Jan 23 18:20:58 2024 -0800

    bpf: Align CAP_NET_ADMIN checks with bpf_capable() approach

    Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
    compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
    CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
    takes over and grants whatever part of BPF syscall is required.

    Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
    One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
    BPF_PROG_LOAD, if the type of BPF program is "network-related" either
    CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.

    But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
    is set:
      - when creating DEVMAP/XDKMAP/CPU_MAP maps;
      - when attaching CGROUP_SKB programs;
      - when handling BPF_PROG_QUERY command.

    This patch is changing the latter three cases to follow BPF_PROG_LOAD
    model, that is allowing to proceed under either CAP_NET_ADMIN or
    CAP_SYS_ADMIN.

    This also makes it cleaner in subsequent BPF token patches to switch
    wholesomely to a generic bpf_token_capable(int cap) check, that always
    falls back to CAP_SYS_ADMIN if requested capability is missing.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-2-andrii@kernel.org

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand c7472ae0b9 bpf: pass attached BTF to the bpf_struct_ops subsystem
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit fcc2c1fb0651477c8ed78a3a293c175ccd70697a
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:59 2024 -0800

    bpf: pass attached BTF to the bpf_struct_ops subsystem

    Pass the fd of a btf from the userspace to the bpf() syscall, and then
    convert the fd into a btf. The btf is generated from the module that
    defines the target BPF struct_ops type.

    In order to inform the kernel about the module that defines the target
    struct_ops type, the userspace program needs to provide a btf fd for the
    respective module's btf. This btf contains essential information on the
    types defined within the module, including the target struct_ops type.

    A btf fd must be provided to the kernel for struct_ops maps and for the bpf
    programs attached to those maps.

    In the case of the bpf programs, the attach_btf_obj_fd parameter is passed
    as part of the bpf_attr and is converted into a btf. This btf is then
    stored in the prog->aux->attach_btf field. Here, it just let the verifier
    access attach_btf directly.

    In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd
    of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a
    btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added
    for map_flags to indicate that the value of value_type_btf_obj_fd is set.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:02 +02:00
Jerome Marchand d160999ffe bpf: pass btf object id in bpf_map_info.
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit 1338b93346587a2a6ac79bbcf55ef5b357745573
Author: Kui-Feng Lee <thinker.li@gmail.com>
Date:   Fri Jan 19 14:49:57 2024 -0800

    bpf: pass btf object id in bpf_map_info.

    Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex:
    bpftools struct_ops dump) know the correct btf from the kernel to look up
    type information of struct_ops types.

    Since struct_ops types can be defined and registered in a module. The
    type information of a struct_ops type are defined in the btf of the
    module defining it.  The userspace tools need to know which btf is for
    the module defining a struct_ops type.

    Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
    Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Jerome Marchand 863aa0ddc0 bpf: Add cookie to perf_event bpf_link_info records
JIRA: https://issues.redhat.com/browse/RHEL-23649

commit d5c16492c66fbfca85f36e42363d32212df5927b
Author: Jiri Olsa <jolsa@kernel.org>
Date:   Fri Jan 19 12:04:58 2024 +0100

    bpf: Add cookie to perf_event bpf_link_info records

    At the moment we don't store cookie for perf_event probes,
    while we do that for the rest of the probes.

    Adding cookie fields to struct bpf_link_info perf event
    probe records:

      perf_event.uprobe
      perf_event.kprobe
      perf_event.tracepoint
      perf_event.perf_event

    And the code to store that in bpf_link_info struct.

    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Song Liu <song@kernel.org>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:01 +02:00
Viktor Malik 457621714e
bpf: Fix re-attachment branch in bpf_tracing_prog_attach
JIRA: https://issues.redhat.com/browse/RHEL-23644

JIRA: https://issues.redhat.com/browse/RHEL-26486
CVE: CVE-2024-26591

commit 715d82ba636cb3629a6e18a33bb9dbe53f9936ee
Author: Jiri Olsa <olsajiri@gmail.com>
Date:   Wed Jan 3 20:05:46 2024 +0100

    bpf: Fix re-attachment branch in bpf_tracing_prog_attach

    The following case can cause a crash due to missing attach_btf:

    1) load rawtp program
    2) load fentry program with rawtp as target_fd
    3) create tracing link for fentry program with target_fd = 0
    4) repeat 3

    In the end we have:

    - prog->aux->dst_trampoline == NULL
    - tgt_prog == NULL (because we did not provide target_fd to link_create)
    - prog->aux->attach_btf == NULL (the program was loaded with attach_prog_fd=X)
    - the program was loaded for tgt_prog but we have no way to find out which one

        BUG: kernel NULL pointer dereference, address: 0000000000000058
        Call Trace:
         <TASK>
         ? __die+0x20/0x70
         ? page_fault_oops+0x15b/0x430
         ? fixup_exception+0x22/0x330
         ? exc_page_fault+0x6f/0x170
         ? asm_exc_page_fault+0x22/0x30
         ? bpf_tracing_prog_attach+0x279/0x560
         ? btf_obj_id+0x5/0x10
         bpf_tracing_prog_attach+0x439/0x560
         __sys_bpf+0x1cf4/0x2de0
         __x64_sys_bpf+0x1c/0x30
         do_syscall_64+0x41/0xf0
         entry_SYSCALL_64_after_hwframe+0x6e/0x76

    Return -EINVAL in this situation.

    Fixes: f3a9507554 ("bpf: Allow trampoline re-attach for tracing and lsm programs")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Song Liu <song@kernel.org>
    Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
    Link: https://lore.kernel.org/r/20240103190559.14750-4-9erthalion6@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 11:07:40 +02:00
Viktor Malik 8bd4507020
bpf: Relax tracing prog recursive attach rules
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 19bfcdf9498aa968ea293417fbbc39e523527ca8
Author: Dmitrii Dolgov <9erthalion6@gmail.com>
Date:   Wed Jan 3 20:05:44 2024 +0100

    bpf: Relax tracing prog recursive attach rules
    
    Currently, it's not allowed to attach an fentry/fexit prog to another
    one fentry/fexit. At the same time it's not uncommon to see a tracing
    program with lots of logic in use, and the attachment limitation
    prevents usage of fentry/fexit for performance analysis (e.g. with
    "bpftool prog profile" command) in this case. An example could be
    falcosecurity libs project that uses tp_btf tracing programs.
    
    Following the corresponding discussion [1], the reason for that is to
    avoid tracing progs call cycles without introducing more complex
    solutions. But currently it seems impossible to load and attach tracing
    programs in a way that will form such a cycle. The limitation is coming
    from the fact that attach_prog_fd is specified at the prog load (thus
    making it impossible to attach to a program loaded after it in this
    way), as well as tracing progs not implementing link_detach.
    
    Replace "no same type" requirement with verification that no more than
    one level of attachment nesting is allowed. In this way only one
    fentry/fexit program could be attached to another fentry/fexit to cover
    profiling use case, and still no cycle could be formed. To implement,
    add a new field into bpf_prog_aux to track nested attachment for tracing
    programs.
    
    [1]: https://lore.kernel.org/bpf/20191108064039.2041889-16-ast@kernel.org/
    
    Acked-by: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Song Liu <song@kernel.org>
    Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
    Link: https://lore.kernel.org/r/20240103190559.14750-2-9erthalion6@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 11:07:40 +02:00
Viktor Malik 9680ef97a0
Revert BPF token-related functionality
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit d17aff807f845cf93926c28705216639c7279110
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Dec 19 07:37:35 2023 -0800

    Revert BPF token-related functionality

    This patch includes the following revert (one  conflicting BPF FS
    patch and three token patch sets, represented by merge commits):
      - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
      - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
      - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
      - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

    Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 11:07:29 +02:00
Viktor Malik 5574e3e96d
bpf: Fix a race condition between btf_put() and map_free()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 59e5791f59dd83e8aa72a4e74217eabb6e8cfd90
Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Thu Dec 14 12:38:15 2023 -0800

    bpf: Fix a race condition between btf_put() and map_free()
    
    When running `./test_progs -j` in my local vm with latest kernel,
    I once hit a kasan error like below:
    
      [ 1887.184724] BUG: KASAN: slab-use-after-free in bpf_rb_root_free+0x1f8/0x2b0
      [ 1887.185599] Read of size 4 at addr ffff888106806910 by task kworker/u12:2/2830
      [ 1887.186498]
      [ 1887.186712] CPU: 3 PID: 2830 Comm: kworker/u12:2 Tainted: G           OEL     6.7.0-rc3-00699-g90679706d486-dirty #494
      [ 1887.188034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [ 1887.189618] Workqueue: events_unbound bpf_map_free_deferred
      [ 1887.190341] Call Trace:
      [ 1887.190666]  <TASK>
      [ 1887.190949]  dump_stack_lvl+0xac/0xe0
      [ 1887.191423]  ? nf_tcp_handle_invalid+0x1b0/0x1b0
      [ 1887.192019]  ? panic+0x3c0/0x3c0
      [ 1887.192449]  print_report+0x14f/0x720
      [ 1887.192930]  ? preempt_count_sub+0x1c/0xd0
      [ 1887.193459]  ? __virt_addr_valid+0xac/0x120
      [ 1887.194004]  ? bpf_rb_root_free+0x1f8/0x2b0
      [ 1887.194572]  kasan_report+0xc3/0x100
      [ 1887.195085]  ? bpf_rb_root_free+0x1f8/0x2b0
      [ 1887.195668]  bpf_rb_root_free+0x1f8/0x2b0
      [ 1887.196183]  ? __bpf_obj_drop_impl+0xb0/0xb0
      [ 1887.196736]  ? preempt_count_sub+0x1c/0xd0
      [ 1887.197270]  ? preempt_count_sub+0x1c/0xd0
      [ 1887.197802]  ? _raw_spin_unlock+0x1f/0x40
      [ 1887.198319]  bpf_obj_free_fields+0x1d4/0x260
      [ 1887.198883]  array_map_free+0x1a3/0x260
      [ 1887.199380]  bpf_map_free_deferred+0x7b/0xe0
      [ 1887.199943]  process_scheduled_works+0x3a2/0x6c0
      [ 1887.200549]  worker_thread+0x633/0x890
      [ 1887.201047]  ? __kthread_parkme+0xd7/0xf0
      [ 1887.201574]  ? kthread+0x102/0x1d0
      [ 1887.202020]  kthread+0x1ab/0x1d0
      [ 1887.202447]  ? pr_cont_work+0x270/0x270
      [ 1887.202954]  ? kthread_blkcg+0x50/0x50
      [ 1887.203444]  ret_from_fork+0x34/0x50
      [ 1887.203914]  ? kthread_blkcg+0x50/0x50
      [ 1887.204397]  ret_from_fork_asm+0x11/0x20
      [ 1887.204913]  </TASK>
      [ 1887.204913]  </TASK>
      [ 1887.205209]
      [ 1887.205416] Allocated by task 2197:
      [ 1887.205881]  kasan_set_track+0x3f/0x60
      [ 1887.206366]  __kasan_kmalloc+0x6e/0x80
      [ 1887.206856]  __kmalloc+0xac/0x1a0
      [ 1887.207293]  btf_parse_fields+0xa15/0x1480
      [ 1887.207836]  btf_parse_struct_metas+0x566/0x670
      [ 1887.208387]  btf_new_fd+0x294/0x4d0
      [ 1887.208851]  __sys_bpf+0x4ba/0x600
      [ 1887.209292]  __x64_sys_bpf+0x41/0x50
      [ 1887.209762]  do_syscall_64+0x4c/0xf0
      [ 1887.210222]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
      [ 1887.210868]
      [ 1887.211074] Freed by task 36:
      [ 1887.211460]  kasan_set_track+0x3f/0x60
      [ 1887.211951]  kasan_save_free_info+0x28/0x40
      [ 1887.212485]  ____kasan_slab_free+0x101/0x180
      [ 1887.213027]  __kmem_cache_free+0xe4/0x210
      [ 1887.213514]  btf_free+0x5b/0x130
      [ 1887.213918]  rcu_core+0x638/0xcc0
      [ 1887.214347]  __do_softirq+0x114/0x37e
    
    The error happens at bpf_rb_root_free+0x1f8/0x2b0:
    
      00000000000034c0 <bpf_rb_root_free>:
      ; {
        34c0: f3 0f 1e fa                   endbr64
        34c4: e8 00 00 00 00                callq   0x34c9 <bpf_rb_root_free+0x9>
        34c9: 55                            pushq   %rbp
        34ca: 48 89 e5                      movq    %rsp, %rbp
      ...
      ;       if (rec && rec->refcount_off >= 0 &&
        36aa: 4d 85 ed                      testq   %r13, %r13
        36ad: 74 a9                         je      0x3658 <bpf_rb_root_free+0x198>
        36af: 49 8d 7d 10                   leaq    0x10(%r13), %rdi
        36b3: e8 00 00 00 00                callq   0x36b8 <bpf_rb_root_free+0x1f8>
                                            <==== kasan function
        36b8: 45 8b 7d 10                   movl    0x10(%r13), %r15d
                                            <==== use-after-free load
        36bc: 45 85 ff                      testl   %r15d, %r15d
        36bf: 78 8c                         js      0x364d <bpf_rb_root_free+0x18d>
    
    So the problem is at rec->refcount_off in the above.
    
    I did some source code analysis and find the reason.
                                      CPU A                        CPU B
      bpf_map_put:
        ...
        btf_put with rcu callback
        ...
        bpf_map_free_deferred
          with system_unbound_wq
        ...                          ...                           ...
        ...                          btf_free_rcu:                 ...
        ...                          ...                           bpf_map_free_deferred:
        ...                          ...
        ...         --------->       btf_struct_metas_free()
        ...         | race condition ...
        ...         --------->                                     map->ops->map_free()
        ...
        ...                          btf->struct_meta_tab = NULL
    
    In the above, map_free() corresponds to array_map_free() and eventually
    calling bpf_rb_root_free() which calls:
      ...
      __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
      ...
    
    Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code:
    
      meta = btf_find_struct_meta(btf, btf_id);
      if (!meta)
        return -EFAULT;
      rec->fields[i].graph_root.value_rec = meta->record;
    
    So basically, 'value_rec' is a pointer to the record in struct_metas_tab.
    And it is possible that that particular record has been freed by
    btf_struct_metas_free() and hence we have a kasan error here.
    
    Actually it is very hard to reproduce the failure with current bpf/bpf-next
    code, I only got the above error once. To increase reproducibility, I added
    a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which
    significantly increased reproducibility.
    
      diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
      index 5e43ddd1b83f..aae5b5213e93 100644
      --- a/kernel/bpf/syscall.c
      +++ b/kernel/bpf/syscall.c
      @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
            struct bpf_map *map = container_of(work, struct bpf_map, work);
            struct btf_record *rec = map->record;
    
      +     mdelay(100);
            security_bpf_map_free(map);
            bpf_map_release_memcg(map);
            /* implementation dependent freeing */
    
    Hao also provided test cases ([1]) for easily reproducing the above issue.
    
    There are two ways to fix the issue, the v1 of the patch ([2]) moving
    btf_put() after map_free callback, and the v5 of the patch ([3]) using
    a kptr style fix which tries to get a btf reference during
    map_check_btf(). Each approach has its pro and cons. The first approach
    delays freeing btf while the second approach needs to acquire reference
    depending on context which makes logic not very elegant and may
    complicate things with future new data structures. Alexei
    suggested in [4] going back to v1 which is what this patch
    tries to do.
    
    Rerun './test_progs -j' with the above mdelay() hack for a couple
    of times and didn't observe the error for the above rb_root test cases.
    Running Hou's test ([1]) is also successful.
    
      [1] https://lore.kernel.org/bpf/20231207141500.917136-1-houtao@huaweicloud.com/
      [2] v1: https://lore.kernel.org/bpf/20231204173946.3066377-1-yonghong.song@linux.dev/
      [3] v5: https://lore.kernel.org/bpf/20231208041621.2968241-1-yonghong.song@linux.dev/
      [4] v4: https://lore.kernel.org/bpf/CAADnVQJ3FiXUhZJwX_81sjZvSYYKCFB3BT6P8D59RS2Gu+0Z7g@mail.gmail.com/
    
    Cc: Hou Tao <houtao@huaweicloud.com>
    Fixes: 958cf2e273f0 ("bpf: Introduce bpf_obj_new")
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20231214203815.1469107-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:28 +02:00
Viktor Malik 53cf4b3c47
bpf: Reduce the scope of rcu_read_lock when updating fd map
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 8f82583f9527b3be9d70d9a5d1f33435e29d0480
Author: Hou Tao <houtao1@huawei.com>
Date:   Thu Dec 14 12:30:09 2023 +0800

    bpf: Reduce the scope of rcu_read_lock when updating fd map
    
    There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
    ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
    callbacks.
    
    For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
    rcu-read-lock because array->ptrs must still be allocated. For
    bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
    rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
    rcu_read_lock() during the invocation of htab_map_update_elem().
    
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231214043010.3458072-2-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:27 +02:00
Viktor Malik f8ac6e60f5
bpf: Update the comments in maybe_wait_bpf_programs()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 2a0c6b41eec90c2a138ea8b574836744783c67ff
Author: Hou Tao <houtao1@huawei.com>
Date:   Mon Dec 11 16:34:47 2023 +0800

    bpf: Update the comments in maybe_wait_bpf_programs()
    
    Since commit 638e4b825d ("bpf: Allows per-cpu maps and map-in-map in
    sleepable programs"), sleepable BPF program can also use map-in-map, but
    maybe_wait_bpf_programs() doesn't handle it accordingly. The main reason
    is that using synchronize_rcu_tasks_trace() to wait for the completions
    of these sleepable BPF programs may incur a very long delay and
    userspace may think it is hung, so the wait for sleepable BPF programs
    is skipped. Update the comments in maybe_wait_bpf_programs() to reflect
    the reason.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/r/20231211083447.1921178-1-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:27 +02:00
Viktor Malik f63cdbdc7c
bpf: Set uattr->batch.count as zero before batched update or deletion
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 06e5c999f10269a532304e89a6adb2fbfeb0593c
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Dec 8 18:23:53 2023 +0800

    bpf: Set uattr->batch.count as zero before batched update or deletion
    
    generic_map_{delete,update}_batch() doesn't set uattr->batch.count as
    zero before it tries to allocate memory for key. If the memory
    allocation fails, the value of uattr->batch.count will be incorrect.
    
    Fix it by setting uattr->batch.count as zero beore batched update or
    deletion.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231208102355.2628918-6-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:19 +02:00
Viktor Malik 0d26a4e9f1
bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 67ad2c73ff29b32bd09135ec07c26e59490dbb3b
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Dec 8 18:23:52 2023 +0800

    bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
    
    There is no need to call maybe_wait_bpf_programs() if update or deletion
    operation fails. So only call maybe_wait_bpf_programs() if update or
    deletion operation succeeds.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231208102355.2628918-5-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:19 +02:00
Viktor Malik 5746898221
bpf: Add missed maybe_wait_bpf_programs() for htab of maps
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 012772581d040607ac1f981f47f6afd2336b4580
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Dec 8 18:23:51 2023 +0800

    bpf: Add missed maybe_wait_bpf_programs() for htab of maps
    
    When doing batched lookup and deletion operations on htab of maps,
    maybe_wait_bpf_programs() is needed to ensure all programs don't use the
    inner map after the bpf syscall returns.
    
    Instead of adding the wait in __htab_map_lookup_and_delete_batch(),
    adding the wait in bpf_map_do_batch() and also removing the calling of
    maybe_wait_bpf_programs() from generic_map_{delete,update}_batch().
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231208102355.2628918-4-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:19 +02:00
Viktor Malik 0a74271bab
bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 37ba5b59d6adfa08926acd3a833608487a18c2ef
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Dec 8 18:23:50 2023 +0800

    bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()
    
    Just like commit 9087c6ff8dfe ("bpf: Call maybe_wait_bpf_programs() only
    once from generic_map_delete_batch()"), there is also no need to call
    maybe_wait_bpf_programs() for each update in batched update, so only
    call it once in generic_map_update_batch().
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231208102355.2628918-3-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:18 +02:00
Viktor Malik 63d3423f21
bpf: Remove unnecessary wait from bpf_map_copy_value()
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit c26f2a8901393c9f81909da0a4324587092bd3a3
Author: Hou Tao <houtao1@huawei.com>
Date:   Fri Dec 8 18:23:49 2023 +0800

    bpf: Remove unnecessary wait from bpf_map_copy_value()
    
    Both map_lookup_elem() and generic_map_lookup_batch() use
    bpf_map_copy_value() to lookup and copy the value, and there is no
    update operation in bpf_map_copy_value(), so just remove the invocation
    of maybe_wait_bpf_programs() from it.
    
    Fixes: 15c14a3dca ("bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions")
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231208102355.2628918-2-houtao@huaweicloud.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:18 +02:00
Viktor Malik 4d09c8bfa4
bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 66d636d70a79c1d37e3eea67ab50969e6aaef983
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:22 2023 -0800

    bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks

    Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
    hook into bpf_map_create, taking not just struct bpf_map, but also
    bpf_attr and bpf_token, to give a fuller context to LSMs.

    Unlike bpf_prog_alloc, there is no need to move the hook around, as it
    currently is firing right before allocating BPF map ID and FD, which
    seems to be a sweet spot.

    But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
    LSM hook is called even if bpf_map_create hook returned error, as if few
    LSMs are combined together it could be that one LSM successfully
    allocated security blob for its needs, while subsequent LSM rejected BPF
    map creation. The former LSM would still need to free up LSM blob, so we
    need to ensure security_bpf_map_free() is called regardless of the
    outcome.

    Acked-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-11-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:10 +02:00
Viktor Malik 5b685d2084
bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit c3dd6e94df7193f33f45d33303f5e85afb2a72dc
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:21 2023 -0800

    bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks

    Based on upstream discussion ([0]), rework existing
    bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
    of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
    program struct. Also, we pass bpf_attr union with all the user-provided
    arguments for BPF_PROG_LOAD command.  This will give LSMs as much
    information as we can basically provide.

    The hook is also BPF token-aware now, and optional bpf_token struct is
    passed as a third argument. bpf_prog_load LSM hook is called after
    a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
    allocated and filled out, but right before performing full-fledged BPF
    verification step.

    bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
    consistency. SELinux code is adjusted to all new names, types, and
    signatures.

    Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
    used by some LSMs to allocate extra security blob, but also by other
    LSMs to reject BPF program loading, we need to make sure that
    bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
    *even* if the hook itself returned error. If we don't do that, we run
    the risk of leaking memory. This seems to be possible today when
    combining SELinux and BPF LSM, as one example, depending on their
    relative ordering.

    Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
    sleepable LSM hooks list, as they are both executed in sleepable
    context. Also drop bpf_prog_load hook from untrusted, as there is no
    issue with refcount or anything else anymore, that originally forced us
    to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM
    hook arguments as trusted"). We now trigger this hook much later and it
    should not be an issue anymore.

      [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/

    Acked-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-10-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:10 +02:00
Viktor Malik 3e424bf42b
bpf: take into account BPF token when fetching helper protos
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 4cbb270e115bc197ff2046aeb54cc951666b16ec
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:19 2023 -0800

    bpf: take into account BPF token when fetching helper protos
    
    Instead of performing unconditional system-wide bpf_capable() and
    perfmon_capable() calls inside bpf_base_func_proto() function (and other
    similar ones) to determine eligibility of a given BPF helper for a given
    program, use previously recorded BPF token during BPF_PROG_LOAD command
    handling to inform the decision.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-8-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:09 +02:00
Viktor Malik 1b699c9ae7
bpf: add BPF token support to BPF_PROG_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit e1cef620f598853a90f17701fcb1057a6768f7b8
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:18 2023 -0800

    bpf: add BPF token support to BPF_PROG_LOAD command
    
    Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
    allowed BPF program types and attach types, derived from BPF FS at BPF
    token creation time. Then make sure we perform bpf_token_capable()
    checks everywhere where it's relevant.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:09 +02:00
Viktor Malik 01e2359746
bpf: add BPF token support to BPF_BTF_LOAD command
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit ee54b1a910e4d49c9a104f31ae3f5b979131adf8
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:17 2023 -0800

    bpf: add BPF token support to BPF_BTF_LOAD command
    
    Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
    through delegated BPF token. BTF loading is a pretty straightforward
    operation, so as long as BPF token is created with allow_cmds granting
    BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
    BTF object.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-6-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:09 +02:00
Viktor Malik 02d0a61b79
bpf: add BPF token support to BPF_MAP_CREATE command
JIRA: https://issues.redhat.com/browse/RHEL-23644

commit 688b7270b3cb75e8ac78123d719967db40336e5b
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Nov 30 10:52:16 2023 -0800

    bpf: add BPF token support to BPF_MAP_CREATE command
    
    Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
    BPF map creation from unprivileged process through delegated BPF token.
    
    Wire through a set of allowed BPF map types to BPF token, derived from
    BPF FS at BPF token creation time. This, in combination with allowed_cmds
    allows to create a narrowly-focused BPF token (controlled by privileged
    agent) with a restrictive set of BPF maps that application can attempt
    to create.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-06-25 10:52:09 +02:00