Commit Graph

82 Commits

Author SHA1 Message Date
Jeff Moyer 63317c47dc net: skbuff: drop the word head from skb cache
JIRA: https://issues.redhat.com/browse/RHEL-64867

commit 025a785ff083729819dc82ac81baf190cb4aee5c
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Feb 8 22:06:42 2023 -0800

    net: skbuff: drop the word head from skb cache
    
    skbuff_head_cache is misnamed (perhaps for historical reasons?)
    because it does not hold heads. Head is the buffer which skb->data
    points to, and also where shinfo lives. struct sk_buff is a metadata
    structure, not the head.
    
    Eric recently added skb_small_head_cache (which allocates actual
    head buffers), let that serve as an excuse to finally clean this up :)
    
    Leave the user-space visible name intact, it could possibly be uAPI.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2024-11-28 16:03:44 -05:00
Felix Maurer 0d4127c3f8 xdp: fix invalid wait context of page_pool_destroy()
JIRA: https://issues.redhat.com/browse/RHEL-65205
JIRA: https://issues.redhat.com/browse/RHEL-54828
CVE: CVE-2024-43834

commit 59a931c5b732ca5fc2ca727f5a72aeabaafa85ec
Author: Taehee Yoo <ap420073@gmail.com>
Date:   Fri Jul 12 09:51:16 2024 +0000

    xdp: fix invalid wait context of page_pool_destroy()

    If the driver uses a page pool, it creates a page pool with
    page_pool_create().
    The reference count of page pool is 1 as default.
    A page pool will be destroyed only when a reference count reaches 0.
    page_pool_destroy() is used to destroy page pool, it decreases a
    reference count.
    When a page pool is destroyed, ->disconnect() is called, which is
    mem_allocator_disconnect().
    This function internally acquires mutex_lock().

    If the driver uses XDP, it registers a memory model with
    xdp_rxq_info_reg_mem_model().
    The xdp_rxq_info_reg_mem_model() internally increases a page pool
    reference count if a memory model is a page pool.
    Now the reference count is 2.

    To destroy a page pool, the driver should call both page_pool_destroy()
    and xdp_unreg_mem_model().
    The xdp_unreg_mem_model() internally calls page_pool_destroy().
    Only page_pool_destroy() decreases a reference count.

    If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
    will face an invalid wait context warning.
    Because xdp_unreg_mem_model() calls page_pool_destroy() with
    rcu_read_lock().
    The page_pool_destroy() internally acquires mutex_lock().

    Splat looks like:
    =============================
    [ BUG: Invalid wait context ]
    6.10.0-rc6+ #4 Tainted: G W
    -----------------------------
    ethtool/1806 is trying to lock:
    ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
    other info that might help us debug this:
    context-{5:5}
    3 locks held by ethtool/1806:
    stack backtrace:
    CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
    Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
    Call Trace:
    <TASK>
    dump_stack_lvl+0x7e/0xc0
    __lock_acquire+0x1681/0x4de0
    ? _printk+0x64/0xe0
    ? __pfx_mark_lock.part.0+0x10/0x10
    ? __pfx___lock_acquire+0x10/0x10
    lock_acquire+0x1b3/0x580
    ? mem_allocator_disconnect+0x73/0x150
    ? __wake_up_klogd.part.0+0x16/0xc0
    ? __pfx_lock_acquire+0x10/0x10
    ? dump_stack_lvl+0x91/0xc0
    __mutex_lock+0x15c/0x1690
    ? mem_allocator_disconnect+0x73/0x150
    ? __pfx_prb_read_valid+0x10/0x10
    ? mem_allocator_disconnect+0x73/0x150
    ? __pfx_llist_add_batch+0x10/0x10
    ? console_unlock+0x193/0x1b0
    ? lockdep_hardirqs_on+0xbe/0x140
    ? __pfx___mutex_lock+0x10/0x10
    ? tick_nohz_tick_stopped+0x16/0x90
    ? __irq_work_queue_local+0x1e5/0x330
    ? irq_work_queue+0x39/0x50
    ? __wake_up_klogd.part.0+0x79/0xc0
    ? mem_allocator_disconnect+0x73/0x150
    mem_allocator_disconnect+0x73/0x150
    ? __pfx_mem_allocator_disconnect+0x10/0x10
    ? mark_held_locks+0xa5/0xf0
    ? rcu_is_watching+0x11/0xb0
    page_pool_release+0x36e/0x6d0
    page_pool_destroy+0xd7/0x440
    xdp_unreg_mem_model+0x1a7/0x2a0
    ? __pfx_xdp_unreg_mem_model+0x10/0x10
    ? kfree+0x125/0x370
    ? bnxt_free_ring.isra.0+0x2eb/0x500
    ? bnxt_free_mem+0x5ac/0x2500
    xdp_rxq_info_unreg+0x4a/0xd0
    bnxt_free_mem+0x1356/0x2500
    bnxt_close_nic+0xf0/0x3b0
    ? __pfx_bnxt_close_nic+0x10/0x10
    ? ethnl_parse_bit+0x2c6/0x6d0
    ? __pfx___nla_validate_parse+0x10/0x10
    ? __pfx_ethnl_parse_bit+0x10/0x10
    bnxt_set_features+0x2a8/0x3e0
    __netdev_update_features+0x4dc/0x1370
    ? ethnl_parse_bitset+0x4ff/0x750
    ? __pfx_ethnl_parse_bitset+0x10/0x10
    ? __pfx___netdev_update_features+0x10/0x10
    ? mark_held_locks+0xa5/0xf0
    ? _raw_spin_unlock_irqrestore+0x42/0x70
    ? __pm_runtime_resume+0x7d/0x110
    ethnl_set_features+0x32d/0xa20

    To fix this problem, it uses rhashtable_lookup_fast() instead of
    rhashtable_lookup() with rcu_read_lock().
    Using xa without rcu_read_lock() here is safe.
    xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
    call_rcu() of mem_xa_remove().
    The mem_xa_remove() is called by page_pool_destroy() if a reference
    count reaches 0.
    The xa is already protected by the reference count mechanism well in the
    control plane.
    So removing rcu_read_lock() for page_pool_destroy() is safe.

    Fixes: c3f812cea0 ("page_pool: do not release pool until inflight == 0.")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
    Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 19:04:36 +01:00
Rado Vrbovsky 14b4cc02eb Merge: BPF 6.9 rebase
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5142

Rebase BPF subsystem to upstream version 6.9

JIRA: https://issues.redhat.com/browse/RHEL-23649

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Approved-by: Viktor Malik <vmalik@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:25:08 +00:00
Jerome Marchand 563e3eb7e7 bpf: treewide: Annotate BPF kfuncs in BTF
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts: Multiple conflicts due to missing kfuncs. All sections were
switched to use the new macro except bpf_mptcp_fmodret_ids which still
use BTF_SET8_* upstream. I don't know why. That might be an upstream
oversight.

commit 6f3189f38a3e995232e028a4c341164c4aca1b20
Author: Daniel Xu <dxu@dxuuu.xyz>
Date:   Sun Jan 28 18:24:08 2024 -0700

    bpf: treewide: Annotate BPF kfuncs in BTF

    This commit marks kfuncs as such inside the .BTF_ids section. The upshot
    of these annotations is that we'll be able to automatically generate
    kfunc prototypes for downstream users. The process is as follows:

    1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs
    2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for
       each function inside BTF_KFUNCS sets
    3. At runtime, vmlinux or module BTF is made available in sysfs
    4. At runtime, bpftool (or similar) can look at provided BTF and
       generate appropriate prototypes for functions with "bpf_kfunc" tag

    To ensure future kfunc are similarly tagged, we now also return error
    inside kfunc registration for untagged kfuncs. For vmlinux kfuncs,
    we also WARN(), as initcall machinery does not handle errors.

    Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
    Acked-by: Benjamin Tissoires <bentiss@kernel.org>
    Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:07 +02:00
CKI Backport Bot 06e6eba3c7 xdp: Remove WARN() from __xdp_reg_mem_model()
JIRA: https://issues.redhat.com/browse/RHEL-51584
CVE: CVE-2024-42082

commit 7e9f79428372c6eab92271390851be34ab26bfb4
Author: Daniil Dulov <d.dulov@aladdin.ru>
Date:   Mon Jun 24 11:07:47 2024 +0300

    xdp: Remove WARN() from __xdp_reg_mem_model()

    syzkaller reports a warning in __xdp_reg_mem_model().

    The warning occurs only if __mem_id_init_hash_table() returns an error. It
    returns the error in two cases:

      1. memory allocation fails;
      2. rhashtable_init() fails when some fields of rhashtable_params
         struct are not initialized properly.

    The second case cannot happen since there is a static const rhashtable_params
    struct with valid fields. So, warning is only triggered when there is a
    problem with memory allocation.

    Thus, there is no sense in using WARN() to handle this error and it can be
    safely removed.

    WARNING: CPU: 0 PID: 5065 at net/core/xdp.c:299 __xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299

    CPU: 0 PID: 5065 Comm: syz-executor883 Not tainted 6.8.0-syzkaller-05271-gf99c5f563c17 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    RIP: 0010:__xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299

    Call Trace:
     xdp_reg_mem_model+0x22/0x40 net/core/xdp.c:344
     xdp_test_run_setup net/bpf/test_run.c:188 [inline]
     bpf_test_run_xdp_live+0x365/0x1e90 net/bpf/test_run.c:377
     bpf_prog_test_run_xdp+0x813/0x11b0 net/bpf/test_run.c:1267
     bpf_prog_test_run+0x33a/0x3b0 kernel/bpf/syscall.c:4240
     __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5649
     __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
     __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
     __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75

    Found by Linux Verification Center (linuxtesting.org) with syzkaller.

    Fixes: 8d5d885275 ("xdp: rhashtable with allocator ID to pointer mapping")
    Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://lore.kernel.org/all/20240617162708.492159-1-d.dulov@aladdin.ru
    Link: https://lore.kernel.org/bpf/20240624080747.36858-1-d.dulov@aladdin.ru

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-07-30 17:09:16 +00:00
Petr Oros ed77f2d928 xdp: Add VLAN tag hint
JIRA: https://issues.redhat.com/browse/RHEL-31890

Conflicts:
- adjusted conflict that was resolved upstream by
  commit 753c8608f3e579 ("Merge tag 'for-netdev' of
  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
- xmo_rx_vlan_tag is placed in netdevice.h instead of xdp.h due to
  missing 680ee0456a57 ("net: invert the netdevice.h vs xdp.h dependency")

Upstream commit(s):
commit e6795330f88b4f643c649a02662d47b779340535
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date:   Tue Dec 5 22:08:38 2023 +0100

    xdp: Add VLAN tag hint

    Implement functionality that enables drivers to expose VLAN tag
    to XDP code.

    VLAN tag is represented by 2 variables:
    - protocol ID, which is passed to bpf code in BE
    - VLAN TCI, in host byte order

    Acked-by: Stanislav Fomichev <sdf@google.com>
    Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-06-05 17:53:56 +02:00
Petr Oros 168fe84036 xdp: remove unused {__,}xdp_release_frame()
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit d4e492338d11937c55841b1279287280d6e35894
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Mon Mar 13 22:55:53 2023 +0100

    xdp: remove unused {__,}xdp_release_frame()

    __xdp_build_skb_from_frame() was the last user of
    {__,}xdp_release_frame(), which detaches pages from the page_pool.
    All the consumers now recycle Page Pool skbs and page, except mlx5,
    stmmac and tsnep drivers, which use page_pool_release_page() directly
    (might change one day). It's safe to assume this functionality is not
    needed anymore and can be removed (in favor of recycling).

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/20230313215553.1045175-5-aleksander.lobakin@intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:53 +02:00
Petr Oros e4a44eea27 xdp: recycle Page Pool backed skbs built from XDP frames
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 9c94bbf9a87b264294f42e6cc0f76d87854733ec
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Mon Mar 13 22:55:52 2023 +0100

    xdp: recycle Page Pool backed skbs built from XDP frames

    __xdp_build_skb_from_frame() state(d):

    /* Until page_pool get SKB return path, release DMA here */

    Page Pool got skb pages recycling in April 2021, but missed this
    function.

    xdp_release_frame() is relevant only for Page Pool backed frames and it
    detaches the page from the corresponding page_pool in order to make it
    freeable via page_frag_free(). It can instead just mark the output skb
    as eligible for recycling if the frame is backed by a pp. No change for
    other memory model types (the same condition check as before).
    cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
    almost).

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/20230313215553.1045175-4-aleksander.lobakin@intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:53 +02:00
Lucas Zampieri 3681c87369 Merge: CNB95: bpf: expose information about netdev xdp-metadata kfunc support
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4041

JIRA: https://issues.redhat.com/browse/RHEL-31945  
Tested: compile only  
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3939

Commits:
```
2e06c57d66d3 ("xdp: use trusted arguments in XDP hints kfuncs")
fc45c5b642db ("bpf: make it easier to add new metadata kfunc")
a9c2a608549b ("bpf: expose information about supported xdp metadata kfunc")
0c6c9b105ee9 ("tools: ynl: extend netdev sample to dump xdp-rx-metadata-features")
0629f22ec130 ("ynl: netdev: drop unnecessary enum-as-flags")
9fea94d3a8ca ("tools: ynl: fix converting flags to names after recent cleanup")
```

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

Approved-by: Felix Maurer <fmaurer@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-05-08 20:17:31 +00:00
Jose Ignacio Tornos Martinez f0e3be5e78 bpf: expose information about supported xdp metadata kfunc
JIRA: https://issues.redhat.com/browse/RHEL-31945

commit a9c2a608549bb1a2363d289d63907640afcf22af
Author: Stanislav Fomichev <sdf@google.com>
Date:   Wed Sep 13 10:13:49 2023 -0700

    bpf: expose information about supported xdp metadata kfunc
    
    Add new xdp-rx-metadata-features member to netdev netlink
    which exports a bitmask of supported kfuncs. Most of the patch
    is autogenerated (headers), the only relevant part is netdev.yaml
    and the changes in netdev-genl.c to marshal into netlink.
    
    Example output on veth:
    
    $ ip link add veth0 type veth peer name veth1 # ifndex == 12
    $ ./tools/net/ynl/samples/netdev 12
    
    Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
       veth1[12]    xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0
    
    Cc: netdev@vger.kernel.org
    Cc: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
2024-04-22 13:45:53 +02:00
Jose Ignacio Tornos Martinez 26daeb6ed4 bpf: make it easier to add new metadata kfunc
JIRA: https://issues.redhat.com/browse/RHEL-31945

commit fc45c5b642dbcac3bb10f4f904e4b863233e5369
Author: Stanislav Fomichev <sdf@google.com>
Date:   Wed Sep 13 10:13:48 2023 -0700

    bpf: make it easier to add new metadata kfunc
    
    No functional changes.
    
    Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc,
    move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx.
    This way, any time new kfunc is added, we don't have to touch
    bpf_dev_bound_resolve_kfunc.
    
    Also document XDP_METADATA_KFUNC_xxx arguments since we now have
    more than two and it might be confusing what is what.
    
    Cc: netdev@vger.kernel.org
    Cc: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
2024-04-22 13:45:53 +02:00
Jose Ignacio Tornos Martinez d5927b53c0 xdp: use trusted arguments in XDP hints kfuncs
JIRA: https://issues.redhat.com/browse/RHEL-31945

commit 2e06c57d66d3f6c26faa5f5b479fb3add34ce85a
Author: Larysa Zaremba <larysa.zaremba@intel.com>
Date:   Tue Jul 11 12:59:26 2023 +0200

    xdp: use trusted arguments in XDP hints kfuncs
    
    Currently, verifier does not reject XDP programs that pass NULL pointer to
    hints functions. At the same time, this case is not handled in any driver
    implementation (including veth). For example, changing
    
    bpf_xdp_metadata_rx_timestamp(ctx, &timestamp);
    
    to
    
    bpf_xdp_metadata_rx_timestamp(ctx, NULL);
    
    in xdp_metadata test successfully crashes the system.
    
    Add KF_TRUSTED_ARGS flag to hints kfunc definitions, so driver code
    does not have to worry about getting invalid pointers.
    
    Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
    Reported-by: Stanislav Fomichev <sdf@google.com>
    Closes: https://lore.kernel.org/bpf/ZKWo0BbpLfkZHbyE@google.com/
    Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20230711105930.29170-1-larysa.zaremba@intel.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
2024-04-22 13:45:53 +02:00
Artem Savkov 1e9cbbe0f6 bpf: Add __bpf_kfunc_{start,end}_defs macros
JIRA: https://issues.redhat.com/browse/RHEL-23643

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflicts: missing xdp commits, missing vma_task iterator

commit 391145ba2accc48b596f3d438af1a6255b62a555
Author: Dave Marchevsky <davemarchevsky@fb.com>
Date:   Tue Oct 31 14:56:24 2023 -0700

    bpf: Add __bpf_kfunc_{start,end}_defs macros

    BPF kfuncs are meant to be called from BPF programs. Accordingly, most
    kfuncs are not called from anywhere in the kernel, which the
    -Wmissing-prototypes warning is unhappy about. We've peppered
    __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
    defined in the codebase to suppress this warning.

    This patch adds two macros meant to bound one or many kfunc definitions.
    All existing kfunc definitions which use these __diag calls to suppress
    -Wmissing-prototypes are migrated to use the newly-introduced macros.
    A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the
    __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier
    version of this patch [0] and another recent mailing list thread [1].

    In the future we might need to ignore different warnings or do other
    kfunc-specific things. This change will make it easier to make such
    modifications for all kfunc defs.

      [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/
      [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/

    Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Cc: Jiri Olsa <olsajiri@gmail.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: David Vernet <void@manifault.com>
    Acked-by: Yafang Shao <laoar.shao@gmail.com>
    Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Artem Savkov <asavkov@redhat.com>
2024-03-27 11:23:42 +01:00
Petr Oros 8333fb8ac8 page_pool: split types and declarations from page_pool.h
JIRA: https://issues.redhat.com/browse/RHEL-16983

Conflicts:
- net/core/skbuff.c:
   adjusted context conflict due to missing 78476d315e1905 ("mctp: Add flow
   extension to skb")
- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h:
   adjusted context conflict due to missing 87a9b2fd9288c5 ("net: hns3: add
   support for TX push mode")
- drivers/net/ethernet/mediatek/mtk_eth_soc.[c|h]
   Chunks ommited due to lack of page_pool support in driver. Missing
   upstream commit 23233e577ef973 ("net: ethernet: mtk_eth_soc: rely on
   page_pool for single page buffers")
- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
   adjusted context conflict due to missing 67f245c2ec0af1 ("mlx5:
   bpf_xdp_metadata_rx_hash add xdp rss hash type")
- drivers/net/ethernet/microsoft/mana/mana_en.c
   adjusted context conflict due to missing 92272ec4107ef4 ("eth: add
   missing xdp.h includes in drivers")
- drivers/net/veth.c
   Chunks ommited due to missing 0ebab78cbcbfd6 ("net: veth: add page_pool
   for page recycling")
- Unmerged path's (missing in rhel):
   drivers/net/ethernet/engleder/tsnep_main.c,
   drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c,
   drivers/net/ethernet/microchip/lan966x/lan966x_main.h,
   drivers/net/ethernet/wangxun/libwx/wx_lib.c

Upstream commit(s):
commit a9ca9f9ceff382b58b488248f0c0da9e157f5d06
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Aug 4 20:05:24 2023 +0200

    page_pool: split types and declarations from page_pool.h

    Split types and pure function declarations from page_pool.h
    and add them in page_page/types.h, so that C sources can
    include page_pool.h and headers should generally only include
    page_pool/types.h as suggested by jakub.
    Rename page_pool.h to page_pool/helpers.h to have both in
    one place.

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
    Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com
    [Jakub: change microsoft/mana, fix kdoc paths in Documentation]
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-11-30 19:11:24 +01:00
Felix Maurer d0892c6775 xdp: rss hash types representation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930
Conflicts: Skipping the driver parts because the code has not yet been
           backported but will go in through another patchset.

commit 0cd917a4a8ace70ff9082d797c899f6bf10de910
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Wed Apr 12 21:48:40 2023 +0200

    xdp: rss hash types representation

    The RSS hash type specifies what portion of packet data NIC hardware used
    when calculating RSS hash value. The RSS types are focused on Internet
    traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
    value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
    primarily TCP vs UDP, but some hardware supports SCTP.

    Hardware RSS types are differently encoded for each hardware NIC. Most
    hardware represent RSS hash type as a number. Determining L3 vs L4 often
    requires a mapping table as there often isn't a pattern or sorting
    according to ISO layer.

    The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
    contains both BITs for the L3/L4 types, and combinations to be used by
    drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
    to BPF via BTF, and it is up to the BPF-programmer to match using these
    defines.

    This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
    a pointer value argument for provide the RSS hash type.
    Change signature for all xmo_rx_hash calls in drivers to make it compile.

    The RSS type implementations for each driver comes as separate patches.

    Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-14 10:44:31 +02:00
Felix Maurer d7f00d2ea1 xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930
Conflicts: Removed the mlx4 and mlx5 driver hunks. The modified functions
           have not yet been backported but will go in through a separate
           patch set.

commit 915efd8a446b74442039d31689d5d863caf82517
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Tue Mar 21 14:52:31 2023 +0100

    xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support

    When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
    implementation returns EOPNOTSUPP, which indicate device driver doesn't
    implement this kfunc.

    Currently many drivers also return EOPNOTSUPP when the hint isn't
    available, which is ambiguous from an API point of view. Instead
    change drivers to return ENODATA in these cases.

    There can be natural cases why a driver doesn't provide any hardware
    info for a specific hint, even on a frame to frame basis (e.g. PTP).
    Lets keep these cases as separate return codes.

    When describing the return values, adjust the function kernel-doc layout
    to get proper rendering for the return values.

    Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata")
    Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata")
    Fixes: 306531f0249f ("veth: Support RX XDP metadata")
    Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Acked-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/167940675120.2718408.8176058626864184420.stgit@firesoul
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-14 10:44:30 +02:00
Felix Maurer bc71097ca8 net: xdp: don't call notifiers during driver init
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930

commit 769639c1fe8a98129aa97c8ee981639db1e8955c
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Mar 16 15:02:34 2023 -0700

    net: xdp: don't call notifiers during driver init

    Drivers will commonly perform feature setting during init, if they use
    the xdp_set_features_flag() helper they'll likely run into an ASSERT_RTNL()
    inside call_netdevice_notifiers_info().

    Don't call the notifier until the device is actually registered.
    Nothing should be tracking the device until its registered and
    after its unregistration has started.

    Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag")
    Link: https://lore.kernel.org/r/20230316220234.598091-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-14 10:44:25 +02:00
Felix Maurer 71d260b25a xdp: add xdp_set_features_flag utility routine
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930

commit f85949f98206b3b11d92d695cea4efda6a81f00e
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Thu Mar 9 13:25:27 2023 +0100

    xdp: add xdp_set_features_flag utility routine

    Introduce xdp_set_features_flag utility routine in order to update
    dynamically xdp_features according to the dynamic hw configuration via
    ethtool (e.g. changing number of hw rx/tx queues).
    Add xdp_clear_features_flag() in order to clear all xdp_feature flag.

    Reviewed-by: Shay Agroskin <shayagr@amazon.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-14 10:44:23 +02:00
Felix Maurer d892a11ed2 drivers: net: turn on XDP features
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930
Conflicts:
- drivers/net/ethernet/engleder/tsnep_main.c: We don't have this driver
- drivers/net/ethernet/fungible/funeth/funeth_main.c: We don't have this
  driver
- drivers/net/ethernet/aquantia/atlantic/aq_nic.c: left out because it
  does not have XDP support
- drivers/net/ethernet/mediatek/mtk_eth_soc.c: left out because mtk_eth_soc
  does not have XDP support
- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: left out because driver
  is not enabled
- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c: left out because driver
  is not enabled
- drivers/net/ethernet/freescale/enetc/enetc_pf.c: left out because driver
  is not enabled
- drivers/net/ethernet/marvell/mvneta.c: left out because driver is not
  enabled
- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c: left out because driver
  is not enabled
- drivers/net/ethernet/socionext/netsec.c: left out because driver is not
  enabled
- drivers/net/ethernet/ti/cpsw.c: left out because driver is not enabled
- drivers/net/ethernet/ti/cpsw_new.c: left out because driver is not
  enabled
- drivers/net/ethernet/netronome/nfp/nfp_net_common.c: Context difference
  due to missing 67d2656b48f1 ("nfp: support RX VLAN ctag/stag strip") and
  7de8b691615f ("nfp: enable TSO by default for nfp netdev")
- drivers/net/ethernet/intel/ice/ice_main.c: Merge conflict upstream
  between 5b246e533d01 ("ice: split probe into smaller functions") and this
  commit. Resolved the same way as upstream in de4287336794 (pull-request:
  bpf-next 2023-02-11): only add the XDP features to the existing
  ice_cfg_netdev() function.
- drivers/net/ethernet/intel/i40e/i40e_main.c: Code difference because the
  driver does not have support for frags/multi-buff. The last parameter of
  xdp_features_set_redirect_target indicates if frags are supported for Tx,
  we change the code to set it to false.

Omitted-fix: 1dc55923296d ("net: mvneta: do not set xdp_features for hw
buffer devices")
mvneta is not enabled and hunks in this commit are skipped
Omitted-fix: 481e96fc1307 ("mvpp2: take care of xdp_features when
reconfiguring queues")
mvpp2 is not enabled and hunks in this commit are skipped
Omitted-fix: e4ac7cc6e5a4 ("net: fec: turn on XDP features")
fec does not have XDP support

commit 66c0e13ad236c74ea88c7c1518f3cef7f372e3da
Author: Marek Majtyka <alardam@gmail.com>
Date:   Wed Feb 1 11:24:18 2023 +0100

    drivers: net: turn on XDP features

    A summary of the flags being set for various drivers is given below.
    Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
    that can be turned off and on at runtime. This means that these flags
    may be set and unset under RTNL lock protection by the driver. Hence,
    READ_ONCE must be used by code loading the flag value.

    Also, these flags are not used for synchronization against the availability
    of XDP resources on a device. It is merely a hint, and hence the read
    may race with the actual teardown of XDP resources on the device. This
    may change in the future, e.g. operations taking a reference on the XDP
    resources of the driver, and in turn inhibiting turning off this flag.
    However, for now, it can only be used as a hint to check whether device
    supports becoming a redirection target.

    Turn 'hw-offload' feature flag on for:
     - netronome (nfp)
     - netdevsim.

    Turn 'native' and 'zerocopy' features flags on for:
     - intel (i40e, ice, ixgbe, igc)
     - mellanox (mlx5).
     - stmmac
     - netronome (nfp)

    Turn 'native' features flags on for:
     - amazon (ena)
     - broadcom (bnxt)
     - freescale (dpaa, dpaa2, enetc)
     - funeth
     - intel (igb)
     - marvell (mvneta, mvpp2, octeontx2)
     - mellanox (mlx4)
     - mtk_eth_soc
     - qlogic (qede)
     - sfc
     - socionext (netsec)
     - ti (cpsw)
     - tap
     - tsnep
     - veth
     - xen
     - virtio_net.

    Turn 'basic' (tx, pass, aborted and drop) features flags on for:
     - netronome (nfp)
     - cavium (thunder)
     - hyperv.

    Turn 'redirect_target' feature flag on for:
     - amanzon (ena)
     - broadcom (bnxt)
     - freescale (dpaa, dpaa2)
     - intel (i40e, ice, igb, ixgbe)
     - ti (cpsw)
     - marvell (mvneta, mvpp2)
     - sfc
     - socionext (netsec)
     - qlogic (qede)
     - mellanox (mlx5)
     - tap
     - veth
     - virtio_net
     - xen

    Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Marek Majtyka <alardam@gmail.com>
    Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-14 10:33:43 +02:00
Viktor Malik 23c9904275 bpf: Add __bpf_kfunc tag to all kfuncs
Bugzilla: https://bugzilla.redhat.com/2178930

commit 400031e05adfcef9e80eca80bdfc3f4b63658be4
Author: David Vernet <void@manifault.com>
Date:   Wed Feb 1 11:30:15 2023 -0600

    bpf: Add __bpf_kfunc tag to all kfuncs

    Now that we have the __bpf_kfunc tag, we should use add it to all
    existing kfuncs to ensure that they'll never be elided in LTO builds.

    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:20 +02:00
Felix Maurer 58671712a5 bpf: XDP metadata RX kfuncs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930
Conflicts:
- include/linux/netdevice.h: Context difference due to missing 97dc7cd92ac6
  ("ptp: Support late timestamp determination")

commit 3d76a4d3d4e591af3e789698affaad88a5a8e8ab
Author: Stanislav Fomichev <sdf@google.com>
Date:   Thu Jan 19 14:15:26 2023 -0800

    bpf: XDP metadata RX kfuncs

    Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
    XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
    supported by the target device, the default implementation is called instead.
    The verifier, at load time, replaces a call to the generic kfunc with a call
    to the per-device one. Per-device kfunc pointers are stored in separate
    struct xdp_metadata_ops.

    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Martin KaFai Lau <martin.lau@linux.dev>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Willem de Bruijn <willemb@google.com>
    Cc: Jesper Dangaard Brouer <brouer@redhat.com>
    Cc: Anatoly Burakov <anatoly.burakov@intel.com>
    Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
    Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
    Cc: Maryam Tahhan <mtahhan@redhat.com>
    Cc: xdp-hints@xdp-project.net
    Cc: netdev@vger.kernel.org
    Signed-off-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/r/20230119221536.3349901-8-sdf@google.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-13 22:45:14 +02:00
Felix Maurer c426bc2d42 xdp: improve page_pool xdp_return performance
Bugzilla: https://bugzilla.redhat.com/2166911

commit fb33ec016b8710281343ce73bec92bfe54bad4fa
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Wed Sep 21 19:05:32 2022 +0200

    xdp: improve page_pool xdp_return performance
    
    During LPC2022 I meetup with my page_pool co-maintainer Ilias. When
    discussing page_pool code we realised/remembered certain optimizations
    had not been fully utilised.
    
    Since commit c07aea3ef4 ("mm: add a signature in struct page") struct
    page have a direct pointer to the page_pool object this page was
    allocated from.
    
    Thus, with this info it is possible to skip the rhashtable_lookup to
    find the page_pool object in __xdp_return().
    
    The rcu_read_lock can be removed as it was tied to xdp_mem_allocator.
    The page_pool object is still safe to access as it tracks inflight pages
    and (potentially) schedules final release from a work queue.
    
    Created a micro benchmark of XDP redirecting from mlx5 into veth with
    XDP_DROP bpf-prog on the peer veth device. This increased performance
    6.5% from approx 8.45Mpps to 9Mpps corresponding to using 7 nanosec
    (27 cycles at 3.8GHz) less per packet.
    
    Suggested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/166377993287.1737053.10258297257583703949.stgit@firesoul
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-03-06 14:54:36 +01:00
Jiri Benc a978280b10 veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb
Bugzilla: https://bugzilla.redhat.com/2120966

commit 718a18a0c8a67f97781e40bdef7cdd055c430996
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Fri Mar 11 10:14:19 2022 +0100

    veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb

    Introduce veth_convert_skb_to_xdp_buff routine in order to
    convert a non-linear skb into a xdp buffer. If the received skb
    is cloned or shared, veth_convert_skb_to_xdp_buff will copy it
    in a new skb composed by order-0 pages for the linear and the
    fragmented area. Moreover veth_convert_skb_to_xdp_buff guarantees
    we have enough headroom for xdp.
    This is a preliminary patch to allow attaching xdp programs with frags
    support on veth devices.

    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/8d228b106bc1903571afd1d77e797bffe9a5ea7c.1646989407.git.lorenzo@kernel.org

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:58:01 +02:00
Jiri Benc c613bd606c bpf: add frags support to the bpf_xdp_adjust_tail() API
Bugzilla: https://bugzilla.redhat.com/2120966

commit bf25146a5595269810b1f47d048f114c5ff9f544
Author: Eelco Chaudron <echaudro@redhat.com>
Date:   Fri Jan 21 11:09:55 2022 +0100

    bpf: add frags support to the bpf_xdp_adjust_tail() API

    This change adds support for tail growing and shrinking for XDP frags.

    When called on a non-linear packet with a grow request, it will work
    on the last fragment of the packet. So the maximum grow size is the
    last fragments tailroom, i.e. no new buffer will be allocated.
    A XDP frags capable driver is expected to set frag_size in xdp_rxq_info
    data structure to notify the XDP core the fragment size.
    frag_size set to 0 is interpreted by the XDP core as tail growing is
    not allowed.
    Introduce __xdp_rxq_info_reg utility routine to initialize frag_size field.

    When shrinking, it will work from the last fragment, all the way down to
    the base buffer depending on the shrinking size. It's important to mention
    that once you shrink down the fragment(s) are freed, so you can not grow
    again to the original size.

    Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
    Link: https://lore.kernel.org/r/eabda3485dda4f2f158b477729337327e609461d.1642758637.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:42 +02:00
Jiri Benc 405bb459e4 xdp: add frags support to xdp_return_{buff/frame}
Bugzilla: https://bugzilla.redhat.com/2120966

commit 7c48cb0176c6d6d3b55029f7ff4ffa05faee6446
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Fri Jan 21 11:09:50 2022 +0100

    xdp: add frags support to xdp_return_{buff/frame}

    Take into account if the received xdp_buff/xdp_frame is non-linear
    recycling/returning the frame memory to the allocator or into
    xdp_frame_bulk.

    Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/a961069febc868508ce1bdf5e53a343eb4e57cb2.1642758637.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:42 +02:00
Jiri Benc 3489f595b8 net: xdp: add xdp_update_skb_shared_info utility routine
Bugzilla: https://bugzilla.redhat.com/2120966

commit d65a1906b31246492449eafe9cace188cb59e26c
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Fri Jan 21 11:09:48 2022 +0100

    net: xdp: add xdp_update_skb_shared_info utility routine

    Introduce xdp_update_skb_shared_info routine to update frags array
    metadata in skb_shared_info data structure converting to a skb from
    a xdp_buff or xdp_frame.
    According to the current skb_shared_info architecture in
    xdp_frame/xdp_buff and to the xdp frags support, there is
    no need to run skb_add_rx_frag() and reset frags array converting the buffer
    to a skb since the frag array will be in the same position for xdp_buff/xdp_frame
    and for the skb, we just need to update memory metadata.
    Introduce XDP_FLAGS_PF_MEMALLOC flag in xdp_buff_flags in order to mark
    the xdp_buff or xdp_frame as under memory-pressure if pages of the frags array
    are under memory pressure. Doing so we can avoid looping over all fragments in
    xdp_update_skb_shared_info routine. The driver is expected to set the
    flag constructing the xdp_buffer using xdp_buff_set_frag_pfmemalloc
    utility routine.
    Rely on xdp_update_skb_shared_info in __xdp_build_skb_from_frame routine
    converting the non-linear xdp_frame to a skb after performing a XDP_REDIRECT.

    Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/bfd23fb8a8d7438724f7819c567cdf99ffd6226f.1642758637.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:42 +02:00
Felix Maurer fda9490a74 xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620

commit e0ae713023a9d09d6e1b454bdc8e8c1dd32c586e
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Mar 9 23:13:45 2022 +0100

    xdp: xdp_mem_allocator can be NULL in trace_mem_connect().

    Since the commit mentioned below __xdp_reg_mem_model() can return a NULL
    pointer. This pointer is dereferenced in trace_mem_connect() which leads
    to segfault.

    The trace points (mem_connect + mem_disconnect) were put in place to
    pair connect/disconnect using the IDs. The ID is only assigned if
    __xdp_reg_mem_model() does not return NULL. That connect trace point is
    of no use if there is no ID.

    Skip that connect trace point if xdp_alloc is NULL.

    [ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace
      point ]

    Fixes: 4a48ef70b93b8 ("xdp: Allow registering memory model without rxq reference")
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-08-24 12:53:58 +02:00
Felix Maurer 9d38cafdba page_pool: Store the XDP mem id
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620

commit 64693ec7774e471f817a725686d93903e919a2e5
Author: Toke Høiland-Jørgensen <toke@redhat.com>
Date:   Mon Jan 3 16:08:08 2022 +0100

    page_pool: Store the XDP mem id

    Store the XDP mem ID inside the page_pool struct so it can be retrieved
    later for use in bpf_prog_run().

    Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-08-24 12:53:57 +02:00
Felix Maurer 244135f84b xdp: Allow registering memory model without rxq reference
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620

commit 4a48ef70b93b8c7ed5190adfca18849e76387b80
Author: Toke Høiland-Jørgensen <toke@redhat.com>
Date:   Mon Jan 3 16:08:06 2022 +0100

    xdp: Allow registering memory model without rxq reference

    The functions that register an XDP memory model take a struct xdp_rxq as
    parameter, but the RXQ is not actually used for anything other than pulling
    out the struct xdp_mem_info that it embeds. So refactor the register
    functions and export variants that just take a pointer to the xdp_mem_info.

    This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
    page_pool instance that is not connected to any network device.

    Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-08-24 12:53:57 +02:00
Felix Maurer 45f5a48cf5 xdp: move the if dev statements to the first
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620

commit f85b244ee395c774a0039c176f46fc0d3747a0ae
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Fri Dec 17 17:25:45 2021 +0800

    xdp: move the if dev statements to the first

    The xdp_rxq_info_unreg() called by xdp_rxq_info_reg() is meaningless when
    dev is NULL, so move the if dev statements to the first.

    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-08-24 12:53:56 +02:00
Felix Maurer 0bd8e47a1a xdp: Remove redundant warning
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071619

commit b859a360d88d5ad239d46978c78fe2b63dd9efe5
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Wed Oct 27 09:38:56 2021 +0800

    xdp: Remove redundant warning

    There is a warning in xdp_rxq_info_unreg_mem_model() when reg_state isn't
    equal to REG_STATE_REGISTERED, so the warning in xdp_rxq_info_unreg() is
    redundant.

    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/r/20211027013856.1866-1-yajun.deng@linux.dev
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-06-07 20:22:40 +02:00
Jakub Kicinski a78cae2476 xdp: Move the rxq_info.mem clearing to unreg_mem_model()
xdp_rxq_info_unreg() implicitly calls xdp_rxq_info_unreg_mem_model().
This may well be confusing to the driver authors, and lead to double free
if they call xdp_rxq_info_unreg_mem_model() before xdp_rxq_info_unreg()
(when mem model type == MEM_TYPE_PAGE_POOL).

In fact error path of mvpp2_rxq_init() seems to currently do exactly that.

The double free will result in refcount underflow in page_pool_destroy().
Make the interface a little more programmer friendly by clearing type and
id so that xdp_rxq_info_unreg_mem_model() can be called multiple times.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210625221612.2637086-1-kuba@kernel.org
2021-06-28 23:07:59 +02:00
Hangbin Liu e624d4ed4a xdp: Extend xdp_redirect_map with broadcast support
This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.

Function bpf_clear_redirect_map() was removed in
commit ee75aef23a ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.

With test topology:
  +-------------------+             +-------------------+
  | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
  +-------------------+             |                   |
                                    |   Host B          |
  +-------------------+             |                   |
  | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
  +-------------------+             |                   |
                                    |          +------+ |
                                    | veth0 -- | Peer | |
                                    | veth1 -- |      | |
                                    | veth2 -- |  NS  | |
                                    |          +------+ |
                                    +-------------------+

On Host A:
 # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64

On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.

Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):

5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M

Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:

5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
2021-05-26 09:46:16 +02:00
Ong Boon Leong 622d13694b xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
xdp_return_frame() may be called outside of NAPI context to return
xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
napi_direct = false. For page_pool memory model, __xdp_return() calls
xdp_return_frame_no_direct() unconditionally and below false negative
kernel BUG throw happened under preempt-rt build:

[  430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
[  430.451678] caller is __xdp_return+0x1ff/0x2e0
[  430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G     U      E     5.12.0-rc2+ #45

Changes in v2:
 - This patch fixes the issue by making xdp_return_frame_no_direct() is
   only called if napi_direct = true, as recommended for better by
   Jesper Dangaard Brouer. Thanks!

Fixes: 2539650fad ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 15:15:23 -07:00
Lorenzo Bianconi 65e6dcf733 net, veth: Alloc skb in bulk for ndo_xdp_xmit
Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
in order to alloc skbs in bulk for XDP_PASS verdict.

Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
The proposed approach has been tested in the following scenario:

eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

XDP_REDIRECT: xdp_redirect_map bpf sample
XDP_PASS: xdp_rxq_info bpf sample

traffic generator: pkt_gen sending udp traffic on a remote device

bpf-next master: ~3.64Mpps
bpf-next + skb bulking allocation: ~3.79Mpps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org
2021-02-04 01:00:07 +01:00
Lorenzo Bianconi 89f479f0ec net, xdp: Introduce xdp_build_skb_from_frame utility routine
Introduce xdp_build_skb_from_frame utility routine to build the skb
from xdp_frame. Respect to __xdp_build_skb_from_frame,
xdp_build_skb_from_frame will allocate the skb object. Rely on
xdp_build_skb_from_frame in veth driver.
Introduce missing xdp metadata support in veth_xdp_rcv_one routine.
Add missing metadata support in veth_xdp_rcv_one().

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/94ade9e853162ae1947941965193190da97457bc.1610475660.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20 14:10:35 -08:00
Lorenzo Bianconi 97a0e1ea7b net, xdp: Introduce __xdp_build_skb_from_frame utility routine
Introduce __xdp_build_skb_from_frame utility routine to build
the skb from xdp_frame. Rely on __xdp_build_skb_from_frame in
cpumap code.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/4f9f4c6b3dd3933770c617eb6689dbc0c6e25863.1610475660.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20 14:10:35 -08:00
Jakub Kicinski 46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Toke Høiland-Jørgensen 998f172962 xdp: Remove the xdp_attachment_flags_ok() callback
Since commit 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
2020-12-09 16:27:42 +01:00
Björn Töpel b02e5a0ebb xsk: Propagate napi_id to XDP socket Rx path
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
Björn Töpel ed1182dc00 xdp: Handle MEM_TYPE_XSK_BUFF_POOL correctly in xdp_return_buff()
It turns out that it does exist a path where xdp_return_buff() is
being passed an XDP buffer of type MEM_TYPE_XSK_BUFF_POOL. This path
is when AF_XDP zero-copy mode is enabled, and a buffer is redirected
to a DEVMAP with an attached XDP program that drops the buffer.

This change simply puts the handling of MEM_TYPE_XSK_BUFF_POOL back
into xdp_return_buff().

Fixes: 82c41671ca ("xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}")
Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/bpf/20201127171726.123627-1-bjorn.topel@gmail.com
2020-11-30 23:00:26 +01:00
Lorenzo Bianconi 7886244736 net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi 8965398713 net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
2020-11-14 02:28:59 +01:00
Andrii Nakryiko e8407fdeb9 bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().

This patch was compile-tested on allyesconfig.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
2020-07-25 20:37:02 -07:00
Hangbin Liu 3ff2351651 xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame()
In commit 34cc0b338a we only handled the frame_sz in convert_to_xdp_frame().
This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame().

Fixes: 34cc0b338a ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
2020-06-17 09:58:15 -07:00
Björn Töpel 82c41671ca xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}
The xdp_return_{frame,frame_rx_napi,buff} function are never used,
except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL
memory type.

To simplify and reduce code, change so that
xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the
type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch
statement in __xdp_return() function.

Suggested-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-14-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel 0807892ecb xsk: Remove MEM_TYPE_ZERO_COPY and corresponding code
There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding
code, including the "handle" member of struct xdp_buff.

rfc->v1: Fixed spelling in commit message. (Björn)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel 2b43470add xsk: Introduce AF_XDP buffer allocation API
In order to simplify AF_XDP zero-copy enablement for NIC driver
developers, a new AF_XDP buffer allocation API is added. The
implementation is based on a single core (single producer/consumer)
buffer pool for the AF_XDP UMEM.

A buffer is allocated using the xsk_buff_alloc() function, and
returned using xsk_buff_free(). If a buffer is disassociated with the
pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is
said to be released. Currently, the release function is only used by
the AF_XDP internals and not visible to the driver.

Drivers using this API should register the XDP memory model with the
new MEM_TYPE_XSK_BUFF_POOL type.

The API is defined in net/xdp_sock_drv.h.

The buffer type is struct xdp_buff, and follows the lifetime of
regular xdp_buffs, i.e.  the lifetime of an xdp_buff is restricted to
a NAPI context. In other words, the API is not replacing xdp_frames.

In addition to introducing the API and implementations, the AF_XDP
core is migrated to use the new APIs.

rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test
         robot)
         Added headroom/chunk size getter. (Maxim/Björn)

v1->v2: Swapped SoBs. (Maxim)

v2->v3: Initialize struct xdp_buff member frame_sz. (Björn)
        Add API to query the DMA address of a frame. (Maxim)
        Do DMA sync for CPU till the end of the frame to handle
        possible growth (frame_sz). (Maxim)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Jesper Dangaard Brouer 34cc0b338a xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame
Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul
2020-05-14 21:21:54 -07:00
Ilias Apalodimas 458de8a97f net: page_pool: API cleanup and comments
Functions starting with __ usually indicate those which are exported,
but should not be called directly. Update some of those declared in the
API and make it more readable.

page_pool_unmap_page() and page_pool_release_page() were doing
exactly the same thing calling __page_pool_clean_page().  Let's
rename __page_pool_clean_page() to page_pool_release_page() and
export it in order to show up on perf logs and get rid of
page_pool_unmap_page().

Finally rename __page_pool_put_page() to page_pool_put_page() since we
can now directly call it from drivers and rename the existing
page_pool_put_page() to page_pool_put_full_page() since they do the same
thing but the latter is trying to sync the full DMA area.

This patch also updates netsec, mvneta and stmmac drivers which use
those functions.

Suggested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:09:25 -08:00