Commit Graph

87 Commits

Author SHA1 Message Date
Rado Vrbovsky f55e4a4e81 Merge: CNB96: page_pool: update to v6.12
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5432

JIRA: https://issues.redhat.com/browse/RHEL-57765

Updating page_pool to upstream v6.12 where necessary to enable driver
updates.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: Petr Oros <poros@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-27 11:19:28 +00:00
Paolo Abeni ba209d7616 net: page_pool: fix warning code
JIRA: https://issues.redhat.com/browse/RHEL-62849
Tested: LNST, Tier1

Upstream commit:
commit 946b6c48cca48591fb495508c5dbfade767173d0
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Jul 5 13:42:06 2024 +0200

    net: page_pool: fix warning code

    WARN_ON_ONCE("string") doesn't really do what appears to
    be intended, so fix that.

    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Fixes: 90de47f020db ("page_pool: fragment API support for 32-bit arch with 64-bit DMA")
    Link: https://patch.msgid.link/20240705134221.2f4de205caa1.I28496dc0f2ced580282d1fb892048017c4491e21@changeid
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-15 09:21:34 +01:00
Felix Maurer ab82ee54a3 page_pool: export page_pool_disable_direct_recycling()
JIRA: https://issues.redhat.com/browse/RHEL-57765
Conflicts:
- net/core/page_pool.c: Context difference due to missing 4a96a4e807c3
  ("page_pool: check for PP direct cache locality later")

commit d7f39aee79f04eeaa42085728423501b33ac5be5
Author: David Wei <dw@davidwei.uk>
Date:   Wed Jun 26 20:01:59 2024 -0700

    page_pool: export page_pool_disable_direct_recycling()

    56ef27e3 unexported page_pool_unlink_napi() and renamed it to
    page_pool_disable_direct_recycling(). This is because there was no
    in-tree user of page_pool_unlink_napi().

    Since then Rx queue API and an implementation in bnxt got merged. In the
    bnxt implementation, it broadly follows the following steps: allocate
    new queue memory + page pool, stop old rx queue, swap, then destroy old
    queue memory + page pool.

    The existing NAPI instance is re-used so when the old page pool that is
    no longer used but still linked to this shared NAPI instance is
    destroyed, it will trigger warnings.

    In my initial patches I unlinked a page pool from a NAPI instance
    directly. Instead, export page_pool_disable_direct_recycling() and call
    that instead to avoid having a driver touch a core struct.

    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David Wei <dw@davidwei.uk>
    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer d4192ad886 page_pool: check for DMA sync shortcut earlier
JIRA: https://issues.redhat.com/browse/RHEL-57765
Conflicts:
- net/core/page_pool.c: upstream ef9226cd56b7 ("page_pool: constify some
  read-only function arguments") and this commit happened in parallel
  leading to conflicts in which args of
  {,__}page_pool_dma_sync_for_device() were const; this was resolved in
  daa121128a2d ("Merge tag 'dma-mapping-6.10-2024-05-20' of
  git://git.infradead.org/users/hch/dma-mapping"). Fixing accordingly in
  the backport: page and page_pool args should be const.

commit 4321de4497b24fbf22389331f4ecd4039a451aa9
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Tue May 7 13:20:25 2024 +0200

    page_pool: check for DMA sync shortcut earlier

    We can save a couple more function calls in the Page Pool code if we
    check for dma_need_sync() earlier, just when we test pp->p.dma_sync.
    Move both these checks into an inline wrapper and call the PP wrapper
    over the generic DMA sync function only when both are true.
    You can't cache the result of dma_need_sync() in &page_pool, as it may
    change anytime if an SWIOTLB buffer is allocated or mapped.

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer ea81551570 page_pool: don't use driver-set flags field directly
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit 403f11ac9ab72fc3bee0b8c80c16e33212ea8cd9
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Tue May 7 13:20:24 2024 +0200

    page_pool: don't use driver-set flags field directly
    
    page_pool::p is driver-defined params, copied directly from the
    structure passed to page_pool_create(). The structure isn't meant
    to be modified by the Page Pool core code and this even might look
    confusing[0][1].
    In order to be able to alter some flags, let's define our own, internal
    fields the same way as the already existing one (::has_init_callback).
    They are defined as bits in the driver-set params, leave them so here
    as well, to not waste byte-per-bit or so. Almost 30 bits are still free
    for future extensions.
    We could've defined only new flags here or only the ones we may need
    to alter, but checking some flags in one place while others in another
    doesn't sound convenient or intuitive. ::flags passed by the driver can
    now go to the "slow" PP params.
    
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org
    Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
    Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer e3fdce3bee page_pool: make sure frag API fields don't span between cachelines
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit 1f20a5769446a1acae67ac9e63d07a594829a789
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Tue May 7 13:20:23 2024 +0200

    page_pool: make sure frag API fields don't span between cachelines
    
    After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
    into fast and slow") that made &page_pool contain only "hot" params at
    the start, cacheline boundary chops frag API fields group in the middle
    again.
    To not bother with this each time fast params get expanded or shrunk,
    let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
    their actual size (2 longs + 1 int). This ensures 16-byte alignment for
    the 32-bit architectures and 32-byte alignment for the 64-bit ones,
    excluding unnecessary false-sharing.
    ::page_state_hold_cnt is used quite intensively on hotpath no matter if
    frag API is used, so move it to the newly created hole in the first
    cacheline.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer e46011f644 page_pool: constify some read-only function arguments
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit ef9226cd56b718c79184a3466d32984a51cb449c
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Thu Apr 18 13:36:11 2024 +0200

    page_pool: constify some read-only function arguments
    
    There are several functions taking pointers to data they don't modify.
    This includes statistics fetching, page and page_pool parameters, etc.
    Constify the pointers, so that call sites will be able to pass const
    pointers as well.
    No functional changes, no visible changes in functions sizes.
    
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer 48b3e0a401 page_pool: try direct bulk recycling
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit 39806b96c89ae5d52092c8f86393ecbfaae26697
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Fri Mar 29 17:55:07 2024 +0100

    page_pool: try direct bulk recycling
    
    Now that the checks for direct recycling possibility live inside the
    Page Pool core, reuse them when performing bulk recycling.
    page_pool_put_page_bulk() can be called from process context as well,
    page_pool_napi_local() takes care of this at the very beginning.
    Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming
    the sending driver uses xdp_return_frame_bulk() on Tx completion.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/20240329165507.3240110-3-aleksander.lobakin@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:32:15 +01:00
Felix Maurer 8643c21aa1 page_pool: check for PP direct cache locality later
JIRA: https://issues.redhat.com/browse/RHEL-57765
Conflicts:
- Context differences (missing skb_cow_data_for_xdp) due to missing
  e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic
  mode")
- net/core/skbuff.c: context difference (condition moved to function) due
  to missing 8cfa2dee325f ("skbuff: Add a function to check if a page
  belongs to page_pool") with no functional changes
- net/core/skbuff.c: context difference (missing skb_kfree_head) due to
  missing bf9f1baa279f ("net: add dedicated kmem_cache for typical/small
  skb->head"); this can appear in revumatic as if skb_free_head was moved
  but that isn't true, the hunks are just reordered (check the line nums)

commit 4a96a4e807c390a9d91b450ebe04eeb2e0ecc076
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Fri Mar 29 17:55:06 2024 +0100

    page_pool: check for PP direct cache locality later

    Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check
    whether it's safe to use direct recycling, we can use both globally for
    each page instead of relying solely on @allow_direct argument.
    Let's assume that @allow_direct means "I'm sure it's local, don't waste
    time rechecking this" and when it's false, try the mentioned params to
    still recycle the page directly. If neither is true, we'll lose some
    CPU cycles, but then it surely won't be hotpath. On the other hand,
    paths where it's possible to use direct cache, but not possible to
    safely set @allow_direct, will benefit from this move.
    The whole propagation of @napi_safe through a dozen of skb freeing
    functions can now go away, which saves us some stack space.

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/20240329165507.3240110-2-aleksander.lobakin@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-11-06 18:18:24 +01:00
Felix Maurer e8f1b5267f net: page_pool: factor out page_pool recycle check
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit 46f40172b68154106cae660c90c7801b61080892
Author: Mina Almasry <almasrymina@google.com>
Date:   Fri Mar 8 12:44:58 2024 -0800

    net: page_pool: factor out page_pool recycle check
    
    The check is duplicated in 2 places, factor it out into a common helper.
    
    Signed-off-by: Mina Almasry <almasrymina@google.com>
    Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/20240308204500.1112858-1-almasrymina@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-10-21 16:37:42 +02:00
Felix Maurer 2e7d822903 net: page_pool: fix recycle stats for system page_pool allocator
JIRA: https://issues.redhat.com/browse/RHEL-57765
Conflicts:
- net/core/page_pool.c: context difference due to missing aaf153aecef1
  ("page_pool: halve BIAS_MAX for multiple user references of a fragment")

commit f853fa5c54e7a0364a52125074dedeaf2c7ddace
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Fri Feb 16 10:25:43 2024 +0100

    net: page_pool: fix recycle stats for system page_pool allocator

    Use global percpu page_pool_recycle_stats counter for system page_pool
    allocator instead of allocating a separate percpu variable for each
    (also percpu) page pool instance.

    Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-10-21 16:37:42 +02:00
Felix Maurer edee1c1e12 page_pool: disable direct recycling based on pool->cpuid on destroy
JIRA: https://issues.redhat.com/browse/RHEL-57765

commit 56ef27e3abe6d6453b1f4f6127041f3a65d7cbc9
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Thu Feb 15 12:39:05 2024 +0100

    page_pool: disable direct recycling based on pool->cpuid on destroy
    
    Now that direct recycling is performed basing on pool->cpuid when set,
    memory leaks are possible:
    
    1. A pool is destroyed.
    2. Alloc cache is emptied (it's done only once).
    3. pool->cpuid is still set.
    4. napi_pp_put_page() does direct recycling basing on pool->cpuid.
    5. Now alloc cache is not empty, but it won't ever be freed.
    
    In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to
    make sure no direct recycling will be possible after emptying the cache.
    This involves a bit of overhead as pool->cpuid now must be accessed
    via READ_ONCE() to avoid partial reads.
    Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling()
    to reflect what it actually does and unexport it.
    
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2024-10-21 16:37:42 +02:00
Wander Lairson Costa de7a3b7b85
net: add generic percpu page_pool allocator
JIRA: https://issues.redhat.com/browse/RHEL-9145

Conflicts: we already have 490a79faf95e ("net: introduce include/net/rps.h")

commit 2b0cfa6e49566c8fa6759734cf821aa6e8271a9e
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Mon Feb 12 10:50:54 2024 +0100

    net: add generic percpu page_pool allocator

    Introduce generic percpu page_pools allocator.
    Moreover add page_pool_create_percpu() and cpuid filed in page_pool struct
    in order to recycle the page in the page_pool "hot" cache if
    napi_pp_put_page() is running on the same cpu.
    This is a preliminary patch to add xdp multi-buff support for xdp running
    in generic mode.

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/80bc4285228b6f4220cd03de1999d86e46e3fcbd.1707729884.git.lorenzo@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
2024-09-16 16:04:27 -03:00
Petr Oros 77a2f42c86 page_pool: transition to reference count management after page draining
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 0a149ab78ee220c75eef797abea7a29f4490e226
Author: Liang Chen <liangchen.linux@gmail.com>
Date:   Tue Dec 12 12:46:11 2023 +0800

    page_pool: transition to reference count management after page draining

    To support multiple users referencing the same fragment,
    'pp_frag_count' is renamed to 'pp_ref_count', transitioning pp pages
    from fragment management to reference count management after draining
    based on the suggestion from [1].

    The idea is that the concept of fragmenting exists before the page is
    drained, and all related functions retain their current names.
    However, once the page is drained, its management shifts to being
    governed by 'pp_ref_count'. Therefore, all functions associated with
    that lifecycle stage of a pp page are renamed.

    [1]
    http://lore.kernel.org/netdev/f71d9448-70c8-8793-dc9a-0eb48a570300@huawei.com

    Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
    Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Reviewed-by: Mina Almasry <almasrymina@google.com>
    Link: https://lore.kernel.org/r/20231212044614.42733-2-liangchen.linux@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:56 +02:00
Petr Oros e777596229 net: page_pool: factor out releasing DMA from releasing the page
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit c3f687d8dfeb33cffbb8f47c30002babfc4895d2
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Dec 7 16:52:32 2023 -0800

    net: page_pool: factor out releasing DMA from releasing the page

    Releasing the DMA mapping will be useful for other types
    of pages, so factor it out. Make sure compiler inlines it,
    to avoid any regressions.

    Signed-off-by: Mina Almasry <almasrymina@google.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:56 +02:00
Petr Oros a58931e73e net: page_pool: mute the periodic warning for visible page pools
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit be0096676e230b43730b8936ac393d155b4e3262
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:39 2023 -0800

    net: page_pool: mute the periodic warning for visible page pools

    Mute the periodic "stalled pool shutdown" warning if the page pool
    is visible to user space. Rolling out a driver using page pools
    to just a few hundred hosts at Meta surfaces applications which
    fail to reap their broken sockets. Obviously it's best if the
    applications are fixed, but we don't generally print warnings
    for application resource leaks. Admins can now depend on the
    netlink interface for getting page pool info to detect buggy
    apps.

    While at it throw in the ID of the pool into the message,
    in rare cases (pools from destroyed netns) this will make
    finding the pool with a debugger easier.

    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:56 +02:00
Petr Oros 3d7a175988 net: page_pool: expose page pool stats via netlink
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit d49010adae737638447369a4eff8f1aab736b076
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:38 2023 -0800

    net: page_pool: expose page pool stats via netlink

    Dump the stats into netlink. More clever approaches
    like dumping the stats per-CPU for each CPU individually
    to see where the packets get consumed can be implemented
    in the future.

    A trimmed example from a real (but recently booted system):

    $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
               --dump page-pool-stats-get
    [{'info': {'id': 19, 'ifindex': 2},
      'alloc-empty': 48,
      'alloc-fast': 3024,
      'alloc-refill': 0,
      'alloc-slow': 48,
      'alloc-slow-high-order': 0,
      'alloc-waive': 0,
      'recycle-cache-full': 0,
      'recycle-cached': 0,
      'recycle-released-refcnt': 0,
      'recycle-ring': 0,
      'recycle-ring-full': 0},
     {'info': {'id': 18, 'ifindex': 2},
      'alloc-empty': 66,
      'alloc-fast': 11811,
      'alloc-refill': 35,
      'alloc-slow': 66,
      'alloc-slow-high-order': 0,
      'alloc-waive': 0,
      'recycle-cache-full': 1145,
      'recycle-cached': 6541,
      'recycle-released-refcnt': 0,
      'recycle-ring': 1275,
      'recycle-ring-full': 0},
     {'info': {'id': 17, 'ifindex': 2},
      'alloc-empty': 73,
      'alloc-fast': 62099,
      'alloc-refill': 413,
    ...

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros 3c3422e0e9 net: page_pool: report when page pool was destroyed
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 69cb4952b6f6a226c1c0a7ca400398aaa8f75cf2
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:37 2023 -0800

    net: page_pool: report when page pool was destroyed

    Report when page pool was destroyed. Together with the inflight
    / memory use reporting this can serve as a replacement for the
    warning about leaked page pools we currently print to dmesg.

    Example output for a fake leaked page pool using some hacks
    in netdevsim (one "live" pool, and one "leaked" on the same dev):

    $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
               --dump page-pool-get
    [{'id': 2, 'ifindex': 3},
     {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]

    Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros acb72c024e net: page_pool: report amount of memory held by page pools
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 7aee8429eedd0970d8add2fb5b856bfc5f5f1fc1
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:36 2023 -0800

    net: page_pool: report amount of memory held by page pools

    Advanced deployments need the ability to check memory use
    of various system components. It makes it possible to make informed
    decisions about memory allocation and to find regressions and leaks.

    Report memory use of page pools. Report both number of references
    and bytes held.

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros a4e8ab078c net: page_pool: id the page pools
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit f17c69649c698e4df3cfe0010b7bbf142dec3e40
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:29 2023 -0800

    net: page_pool: id the page pools

    To give ourselves the flexibility of creating netlink commands
    and ability to refer to page pool instances in uAPIs create
    IDs for page pools.

    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros 843f234bb8 net: page_pool: factor out uninit
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 23cfaf67ba5d2f013d2576b8a9173c45a4a7f895
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Sun Nov 26 15:07:28 2023 -0800

    net: page_pool: factor out uninit

    We'll soon (next change in the series) need a fuller unwind path
    in page_pool_create() so create the inverse of page_pool_init().

    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros 9221d8eca7 net: page_pool: avoid touching slow on the fastpath
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 2da0cac1e9494f34c5a3438e5c4c7e662e1b7445
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Nov 20 16:00:35 2023 -0800

    net: page_pool: avoid touching slow on the fastpath

    To fully benefit from previous commit add one byte of state
    in the first cache line recording if we need to look at
    the slow part.

    The packing isn't all that impressive right now, we create
    a 7B hole. I'm expecting Olek's rework will reshuffle this,
    anyway.

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Reviewed-by: Mina Almasry <almasrymina@google.com>
    Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros 9fda55f7c3 net: page_pool: split the page_pool_params into fast and slow
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 5027ec19f1049a07df5b0a37b1f462514cf2724b
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Nov 20 16:00:34 2023 -0800

    net: page_pool: split the page_pool_params into fast and slow

    struct page_pool is rather performance critical and we use
    16B of the first cache line to store 2 pointers used only
    by test code. Future patches will add more informational
    (non-fast path) attributes.

    It's convenient for the user of the API to not have to worry
    which fields are fast and which are slow path. Use struct
    groups to split the params into the two categories internally.

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Reviewed-by: Mina Almasry <almasrymina@google.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:55 +02:00
Petr Oros 60c4332e7f page_pool: introduce page_pool_alloc() API
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit de97502e16fc406a74edee8359612e518986cf59
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Oct 20 17:59:50 2023 +0800

    page_pool: introduce page_pool_alloc() API

    Currently page pool supports the below use cases:
    use case 1: allocate page without page splitting using
                page_pool_alloc_pages() API if the driver knows
                that the memory it need is always bigger than
                half of the page allocated from page pool.
    use case 2: allocate page frag with page splitting using
                page_pool_alloc_frag() API if the driver knows
                that the memory it need is always smaller than
                or equal to the half of the page allocated from
                page pool.

    There is emerging use case [1] & [2] that is a mix of the
    above two case: the driver doesn't know the size of memory it
    need beforehand, so the driver may use something like below to
    allocate memory with least memory utilization and performance
    penalty:

    if (size << 1 > max_size)
            page = page_pool_alloc_pages();
    else
            page = page_pool_alloc_frag();

    To avoid the driver doing something like above, add the
    page_pool_alloc() API to support the above use case, and update
    the true size of memory that is acctually allocated by updating
    '*size' back to the driver in order to avoid exacerbating
    truesize underestimate problem.

    Rename page_pool_free() which is used in the destroy process to
    __page_pool_destroy() to avoid confusion with the newly added
    API.

    1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
    2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20231020095952.11055-4-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:54 +02:00
Petr Oros 6c5988280c page_pool: remove PP_FLAG_PAGE_FRAG
JIRA: https://issues.redhat.com/browse/RHEL-31941

Conflicts:
- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: chunk skipped due to
  missing 93188e9642c3ce ("net: hns3: support skb's frag page recycling
  based on page pool")
- drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c chunk skipped
  due to missing b2e3406a38f0f4 ("octeontx2-pf: Add support for page
  pool")

Upstream commit(s):
commit 09d96ee5674a0eaa800c664353756ecc45c4a87f
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Oct 20 17:59:49 2023 +0800

    page_pool: remove PP_FLAG_PAGE_FRAG

    PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
    handling is unified and page_pool_alloc_frag() is supported
    in 32-bit arch with 64-bit DMA, so remove it.

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:54 +02:00
Petr Oros 62b3753c2d page_pool: unify frag_count handling in page_pool_is_last_frag()
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 58d53d8f7da63dd13903bec0a40b3009a841b61b
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Oct 20 17:59:48 2023 +0800

    page_pool: unify frag_count handling in page_pool_is_last_frag()

    Currently when page_pool_create() is called with
    PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
    allowed to be called under the below constraints:
    1. page_pool_fragment_page() need to be called to setup
       page->pp_frag_count immediately.
    2. page_pool_defrag_page() often need to be called to drain
       the page->pp_frag_count when there is no more user will
       be holding on to that page.

    Those constraints exist in order to support a page to be
    split into multi fragments.

    And those constraints have some overhead because of the
    cache line dirtying/bouncing and atomic update.

    Those constraints are unavoidable for case when we need a
    page to be split into more than one fragment, but there is
    also case that we want to avoid the above constraints and
    their overhead when a page can't be split as it can only
    hold a fragment as requested by user, depending on different
    use cases:
    use case 1: allocate page without page splitting.
    use case 2: allocate page with page splitting.
    use case 3: allocate page with or without page splitting
                depending on the fragment size.

    Currently page pool only provide page_pool_alloc_pages() and
    page_pool_alloc_frag() API to enable the 1 & 2 separately,
    so we can not use a combination of 1 & 2 to enable 3, it is
    not possible yet because of the per page_pool flag
    PP_FLAG_PAGE_FRAG.

    So in order to allow allocating unsplit page without the
    overhead of split page while still allow allocating split
    page we need to remove the per page_pool flag in
    page_pool_is_last_frag(), as best as I can think of, it seems
    there are two methods as below:
    1. Add per page flag/bit to indicate a page is split or
       not, which means we might need to update that flag/bit
       everytime the page is recycled, dirtying the cache line
       of 'struct page' for use case 1.
    2. Unify the page->pp_frag_count handling for both split and
       unsplit page by assuming all pages in the page pool is split
       into a big fragment initially.

    As page pool already supports use case 1 without dirtying the
    cache line of 'struct page' whenever a page is recyclable, we
    need to support the above use case 3 with minimal overhead,
    especially not adding any noticeable overhead for use case 1,
    and we are already doing an optimization by not updating
    pp_frag_count in page_pool_defrag_page() for the last fragment
    user, this patch chooses to unify the pp_frag_count handling
    to support the above use case 3.

    There is no noticeable performance degradation and some
    justification for unifying the frag_count handling with this
    patch applied using a micro-benchmark testing in [1].

    1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:54 +02:00
Petr Oros 1eb93c611d page_pool: fragment API support for 32-bit arch with 64-bit DMA
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 90de47f020db086f7929e09f64efd0cf627d6869
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Oct 13 14:48:21 2023 +0800

    page_pool: fragment API support for 32-bit arch with 64-bit DMA

    Currently page_pool_alloc_frag() is not supported in 32-bit
    arch with 64-bit DMA because of the overlap issue between
    pp_frag_count and dma_addr_upper in 'struct page' for those
    arches, which seems to be quite common, see [1], which means
    driver may need to handle it when using fragment API.

    It is assumed that the combination of the above arch with an
    address space >16TB does not exist, as all those arches have
    64b equivalent, it seems logical to use the 64b version for a
    system with a large address space. It is also assumed that dma
    address is page aligned when we are dma mapping a page aligned
    buffer, see [2].

    That means we're storing 12 bits of 0 at the lower end for a
    dma address, we can reuse those bits for the above arches to
    support 32b+12b, which is 16TB of memory.

    If we make a wrong assumption, a warning is emitted so that
    user can report to us.

    1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
    2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/

    Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20231013064827.61135-2-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:54 +02:00
Petr Oros 29bd06e0f7 page_pool: add a lockdep check for recycling in hardirq
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit ff4e538c8c3e675a15e1e49509c55951832e0451
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Aug 4 20:05:28 2023 +0200

    page_pool: add a lockdep check for recycling in hardirq

    Page pool use in hardirq is prohibited, add debug checks
    to catch misuses. IIRC we previously discussed using
    DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns
    that people will have DEBUG_NET enabled in perf testing.
    I don't think anyone enables lockdep in perf testing,
    so use lockdep to avoid pushback and arguing :)

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
    Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:54 +02:00
Petr Oros 163398a3e1 net: page_pool: merge page_pool_release_page() with page_pool_return_page()
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 07e0c7d3179da5d06132f3d71b740aa91bde52aa
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Jul 19 18:04:09 2023 -0700

    net: page_pool: merge page_pool_release_page() with page_pool_return_page()

    Now that page_pool_release_page() is not exported we can
    merge it with page_pool_return_page(). I believe that
    the "Do not replace this with page_pool_return_page()"
    comment was there in case page_pool_return_page() was
    not inlined, to avoid two function calls.

    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
    Link: https://lore.kernel.org/r/20230720010409.1967072-5-kuba@kernel.org
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:53 +02:00
Petr Oros 59ed484266 net: page_pool: hide page_pool_release_page()
JIRA: https://issues.redhat.com/browse/RHEL-31941

Conflicts:
- adjusted conflicts due to already applied 82e896d992fa63 ("docs:
  net: page_pool: use kdoc to avoid duplicating the information")
  and a9ca9f9ceff382 ("page_pool: split types and declarations from
  page_pool.h")

Upstream commit(s):
commit 535b9c61bdef6017228c708128b7849a476f8da5
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Jul 19 18:04:08 2023 -0700

    net: page_pool: hide page_pool_release_page()

    There seems to be no user calling page_pool_release_page()
    for legit reasons, all the users simply haven't been converted
    to skb-based recycling, yet. Previous changes converted them.
    Update the docs, and unexport the function.

    Link: https://lore.kernel.org/r/20230720010409.1967072-4-kuba@kernel.org
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:53 +02:00
Petr Oros 4534a6bc69 page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings
JIRA: https://issues.redhat.com/browse/RHEL-31941

Upstream commit(s):
commit 8e4c62c7d980eaf0f64c1c0ef0c80f5685af0fb6
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Mon Apr 17 08:28:05 2023 -0700

    page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings

    Commit c519fe9a4f ("bnxt: add dma mapping attributes") added
    DMA_ATTR_WEAK_ORDERING to DMA attrs on bnxt. It has since spread
    to a few more drivers (possibly as a copy'n'paste).

    DMA_ATTR_WEAK_ORDERING only seems to matter on Sparc and PowerPC/cell,
    the rarity of these platforms is likely why we never bothered adding
    the attribute in the page pool, even though it should be safe to add.

    To make the page pool migration in drivers which set this flag less
    of a risk (of regressing the precious sparc database workloads or
    whatever needed this) let's add DMA_ATTR_WEAK_ORDERING on all
    page pool DMA mappings.

    We could make this a driver opt-in but frankly I don't think it's
    worth complicating the API. I can't think of a reason why device
    accesses to packet memory would have to be ordered.

    Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/r/20230417152805.331865-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-05-16 19:27:53 +02:00
Sabrina Dubroca 06fe287412 net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>
JIRA: https://issues.redhat.com/browse/RHEL-31751

Conflicts: context around #include in net/core/skbuff.c

commit 75eaf63ea7afeafd026ffef03bdc69e31f10829b
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Fri Aug 4 20:05:25 2023 +0200

    net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>

    Currently, touching <net/page_pool/types.h> triggers a rebuild of more
    than half of the kernel. That's because it's included in
    <linux/skbuff.h>. And each new include to page_pool/types.h adds more
    [useless] data for the toolchain to process per each source file from
    that pile.

    In commit 6a5bcd84e8 ("page_pool: Allow drivers to hint on SKB
    recycling"), Matteo included it to be able to call a couple of functions
    defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as
    long as page pool owns the page") one of the calls was removed, so only
    one was left. It's the call to page_pool_return_skb_page() in
    napi_frag_unref(). The function is external and doesn't have any
    dependencies. Having very niche page_pool_types.h included only for that
    looks like an overkill.

    As %PP_SIGNATURE is not local to page_pool.c (was only in the
    early submissions), nothing holds this function there. Teleport
    page_pool_return_skb_page() to skbuff.c, just next to the main consumer,
    skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't
    work with skbs at all and the former name tells nothing. The #if guards
    here are only to not compile and have it in the vmlinux when not needed
    -- both call sites are already guarded.
    Now, touching page_pool_types.h only triggers rebuilding of the drivers
    using it and a couple of core networking files.

    Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
    Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
    Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
2024-04-11 10:04:27 +02:00
Scott Weaver 2a096a138f Merge: bpf: backport fixes from upstream (phase 1)
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3337

JIRA: https://issues.redhat.com/browse/RHEL-15913

Backporting relevant fixes from upstream.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Approved-by: Artem Savkov <asavkov@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2024-01-22 12:01:20 -05:00
Petr Oros 8333fb8ac8 page_pool: split types and declarations from page_pool.h
JIRA: https://issues.redhat.com/browse/RHEL-16983

Conflicts:
- net/core/skbuff.c:
   adjusted context conflict due to missing 78476d315e1905 ("mctp: Add flow
   extension to skb")
- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h:
   adjusted context conflict due to missing 87a9b2fd9288c5 ("net: hns3: add
   support for TX push mode")
- drivers/net/ethernet/mediatek/mtk_eth_soc.[c|h]
   Chunks ommited due to lack of page_pool support in driver. Missing
   upstream commit 23233e577ef973 ("net: ethernet: mtk_eth_soc: rely on
   page_pool for single page buffers")
- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
   adjusted context conflict due to missing 67f245c2ec0af1 ("mlx5:
   bpf_xdp_metadata_rx_hash add xdp rss hash type")
- drivers/net/ethernet/microsoft/mana/mana_en.c
   adjusted context conflict due to missing 92272ec4107ef4 ("eth: add
   missing xdp.h includes in drivers")
- drivers/net/veth.c
   Chunks ommited due to missing 0ebab78cbcbfd6 ("net: veth: add page_pool
   for page recycling")
- Unmerged path's (missing in rhel):
   drivers/net/ethernet/engleder/tsnep_main.c,
   drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c,
   drivers/net/ethernet/microchip/lan966x/lan966x_main.h,
   drivers/net/ethernet/wangxun/libwx/wx_lib.c

Upstream commit(s):
commit a9ca9f9ceff382b58b488248f0c0da9e157f5d06
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Aug 4 20:05:24 2023 +0200

    page_pool: split types and declarations from page_pool.h

    Split types and pure function declarations from page_pool.h
    and add them in page_page/types.h, so that C sources can
    include page_pool.h and headers should generally only include
    page_pool/types.h as suggested by jakub.
    Rename page_pool.h to page_pool/helpers.h to have both in
    one place.

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Suggested-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
    Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com
    [Jakub: change microsoft/mana, fix kdoc paths in Documentation]
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-11-30 19:11:24 +01:00
Petr Oros 4dc9eab8bd docs: net: page_pool: use kdoc to avoid duplicating the information
JIRA: https://issues.redhat.com/browse/RHEL-16983

Conflicts:
- adjusted conflict in Documentation/networking/page_pool.rst due to
  missing 535b9c61bdef60 net: page_pool: hide page_pool_release_page()

Upstream commit(s):
commit 82e896d992fa631cda1f63239fd47b3ab781ffa6
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Aug 2 09:18:21 2023 -0700

    docs: net: page_pool: use kdoc to avoid duplicating the information

    All struct members of the driver-facing APIs are documented twice,
    in the code and under Documentation. This is a bit tedious.

    I also get the feeling that a lot of developers will read the header
    when coding, rather than the doc. Bring the two a little closer
    together by using kdoc for structs and functions.

    Using kdoc also gives us links (mentioning a function or struct
    in the text gets replaced by a link to its doc).

    Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
    Tested-by: Randy Dunlap <rdunlap@infradead.org>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Petr Oros <poros@redhat.com>
2023-11-30 14:37:59 +01:00
Felix Maurer 9c2bc8bcad net: page_pool: add missing free_percpu when page_pool_init fail
JIRA: https://issues.redhat.com/browse/RHEL-15913

commit 8ffbd1669ed1d58939d6e878dffaa2f60bf961a4
Author: Jian Shen <shenjian15@huawei.com>
Date:   Mon Oct 30 17:12:56 2023 +0800

    net: page_pool: add missing free_percpu when page_pool_init fail
    
    When ptr_ring_init() returns failure in page_pool_init(), free_percpu()
    is not called to free pool->recycle_stats, which may cause memory
    leak.
    
    Fixes: ad6fa1e1ab1b ("page_pool: Add recycle stats")
    Signed-off-by: Jian Shen <shenjian15@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/20231030091256.2915394-1-shaojijie@huawei.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-11-07 15:32:02 +01:00
Ivan Vecera 734b571259 page_pool: unlink from napi during destroy
JIRA: https://issues.redhat.com/browse/RHEL-12613

commit dd64b232deb8d48812a2ea739d1fedaeaffb59ed
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Apr 19 11:20:06 2023 -0700

    page_pool: unlink from napi during destroy

    Jesper points out that we must prevent recycling into cache
    after page_pool_destroy() is called, because page_pool_destroy()
    is not synchronized with recycling (some pages may still be
    outstanding when destroy() gets called).

    I assumed this will not happen because NAPI can't be scheduled
    if its page pool is being destroyed. But I missed the fact that
    NAPI may get reused. For instance when user changes ring configuration
    driver may allocate a new page pool, stop NAPI, swap, start NAPI,
    and then destroy the old pool. The NAPI is running so old page
    pool will think it can recycle to the cache, but the consumer
    at that point is the destroy() path, not NAPI.

    To avoid extra synchronization let the drivers do "unlinking"
    during the "swap" stage while NAPI is indeed disabled.

    Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI")
    Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
    Link: https://lore.kernel.org/all/e8df2654-6a5b-3c92-489d-2fe5e444135f@redhat.com/
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/r/20230419182006.719923-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-31 15:09:26 +01:00
Ivan Vecera d80ce17d20 page_pool: allow caching from safely localized NAPI
JIRA: https://issues.redhat.com/browse/RHEL-12613

Conflicts:
- simple context conflict in net/core/dev.c due to absence of commit
  8b43fd3d1d7d8 ("net: optimize ____napi_schedule() to avoid extra
  NET_RX_SOFTIRQ") that is out of scope of this series

commit 8c48eea3adf3119e0a3fc57bd31f6966f26ee784
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Apr 12 21:26:04 2023 -0700

    page_pool: allow caching from safely localized NAPI

    Recent patches to mlx5 mentioned a regression when moving from
    driver local page pool to only using the generic page pool code.
    Page pool has two recycling paths (1) direct one, which runs in
    safe NAPI context (basically consumer context, so producing
    can be lockless); and (2) via a ptr_ring, which takes a spin
    lock because the freeing can happen from any CPU; producer
    and consumer may run concurrently.

    Since the page pool code was added, Eric introduced a revised version
    of deferred skb freeing. TCP skbs are now usually returned to the CPU
    which allocated them, and freed in softirq context. This places the
    freeing (producing of pages back to the pool) enticingly close to
    the allocation (consumer).

    If we can prove that we're freeing in the same softirq context in which
    the consumer NAPI will run - lockless use of the cache is perfectly fine,
    no need for the lock.

    Let drivers link the page pool to a NAPI instance. If the NAPI instance
    is scheduled on the same CPU on which we're freeing - place the pages
    in the direct cache.

    With that and patched bnxt (XDP enabled to engage the page pool, sigh,
    bnxt really needs page pool work :() I see a 2.6% perf boost with
    a TCP stream test (app on a different physical core than softirq).

    The CPU use of relevant functions decreases as expected:

      page_pool_refill_alloc_cache   1.17% -> 0%
      _raw_spin_lock                 2.41% -> 0.98%

    Only consider lockless path to be safe when NAPI is scheduled
    - in practice this should cover majority if not all of steady state
    workloads. It's usually the NAPI kicking in that causes the skb flush.

    The main case we'll miss out on is when application runs on the same
    CPU as NAPI. In that case we don't use the deferred skb free path.

    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-31 15:09:26 +01:00
Felix Maurer ce4cb58f61 page_pool: fix inconsistency for page_pool_ring_[un]lock()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218483
commit 368d3cb406cdd074d1df2ad9ec06d1bfcb664882
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Mon May 22 11:17:14 2023 +0800

    page_pool: fix inconsistency for page_pool_ring_[un]lock()
    
    page_pool_ring_[un]lock() use in_softirq() to decide which
    spin lock variant to use, and when they are called in the
    context with in_softirq() being false, spin_lock_bh() is
    called in page_pool_ring_lock() while spin_unlock() is
    called in page_pool_ring_unlock(), because spin_lock_bh()
    has disabled the softirq in page_pool_ring_lock(), which
    causes inconsistency for spin lock pair calling.
    
    This patch fixes it by returning in_softirq state from
    page_pool_producer_lock(), and use it to decide which
    spin lock variant to use in page_pool_producer_unlock().
    
    As pool->ring has both producer and consumer lock, so
    rename it to page_pool_producer_[un]lock() to reflect
    the actual usage. Also move them to page_pool.c as they
    are only used there, and remove the 'inline' as the
    compiler may have better idea to do inlining or not.
    
    Fixes: 7886244736 ("net: page_pool: Add bulk support for ptr_ring")
    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-29 12:37:10 +02:00
Felix Maurer a9ac84a3b0 net: page_pool: use in_softirq() instead
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2178930

commit 542bcea4be866b14b3a5c8e90773329066656c43
Author: Qingfang DENG <qingfang.deng@siflower.com.cn>
Date:   Fri Feb 3 09:16:11 2023 +0800

    net: page_pool: use in_softirq() instead

    We use BH context only for synchronization, so we don't care if it's
    actually serving softirq or not.

    As a side node, in case of threaded NAPI, in_serving_softirq() will
    return false because it's in process context with BH off, making
    page_pool_recycle_in_cache() unreachable.

    Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
    Tested-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-06-13 22:45:48 +02:00
Chris von Recklinghausen 15416c6b7e mm/swap: convert __put_page() to __folio_put()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 8d29c7036f5ff360ea1f51b9fed5d909be7c8094
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 18:50:13 2022 +0100

    mm/swap: convert __put_page() to __folio_put()

    Saves 11 bytes of text by removing a check of PageTail.

    Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:20 -04:00
Felix Maurer 1cc26eafb4 net: page_pool: optimize page pool page allocation in NUMA scenario
Bugzilla: https://bugzilla.redhat.com/2137876

commit d810d367ec40a1031173a447bd0146cf48e98733
Author: Jie Wang <wangjie125@huawei.com>
Date:   Tue Jul 5 19:35:15 2022 +0800

    net: page_pool: optimize page pool page allocation in NUMA scenario
    
    Currently NIC packet receiving performance based on page pool deteriorates
    occasionally. To analysis the causes of this problem page allocation stats
    are collected. Here are the stats when NIC rx performance deteriorates:
    
    bandwidth(Gbits/s)		16.8		6.91
    rx_pp_alloc_fast		13794308	21141869
    rx_pp_alloc_slow		108625		166481
    rx_pp_alloc_slow_h		0		0
    rx_pp_alloc_empty		8192		8192
    rx_pp_alloc_refill		0		0
    rx_pp_alloc_waive		100433		158289
    rx_pp_recycle_cached		0		0
    rx_pp_recycle_cache_full	0		0
    rx_pp_recycle_ring		362400		420281
    rx_pp_recycle_ring_full		6064893		9709724
    rx_pp_recycle_released_ref	0		0
    
    The rx_pp_alloc_waive count indicates that a large number of pages' numa
    node are inconsistent with the NIC device numa node. Therefore these pages
    can't be reused by the page pool. As a result, many new pages would be
    allocated by __page_pool_alloc_pages_slow which is time consuming. This
    causes the NIC rx performance fluctuations.
    
    The main reason of huge numa mismatch pages in page pool is that page pool
    uses alloc_pages_bulk_array to allocate original pages. This function is
    not suitable for page allocation in NUMA scenario. So this patch uses
    alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
    the NUMA consistent between NIC device and allocated pages.
    
    Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth
    is higher and more stable compared to the datas above. Here are three test
    stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
    indicates pages allocated from slow patch is relatively low.
    
    bandwidth(Gbits/s)		93		93.9		93.8
    rx_pp_alloc_fast		60066264	61266386	60938254
    rx_pp_alloc_slow		16512		16517		16539
    rx_pp_alloc_slow_ho		0		0		0
    rx_pp_alloc_empty		16512		16517		16539
    rx_pp_alloc_refill		473841		481910		481585
    rx_pp_alloc_waive		0		0		0
    rx_pp_recycle_cached		0		0		0
    rx_pp_recycle_cache_full	0		0		0
    rx_pp_recycle_ring		29754145	30358243	30194023
    rx_pp_recycle_ring_full		0		0		0
    rx_pp_recycle_released_ref	0		0		0
    
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/20220705113515.54342-1-huangguangbin2@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2023-01-05 15:46:51 +01:00
Felix Maurer 88509dacf2 net: page_pool: add page allocation stats for two fast page allocate path
Bugzilla: https://bugzilla.redhat.com/2120968

commit 0f6deac3a07958195173119627502350925dce78
Author: Jie Wang <wangjie125@huawei.com>
Date:   Thu May 12 14:56:31 2022 +0800

    net: page_pool: add page allocation stats for two fast page allocate path
    
    Currently If use page pool allocation stats to analysis a RX performance
    degradation problem. These stats only count for pages allocate from
    page_pool_alloc_pages. But nic drivers such as hns3 use
    page_pool_dev_alloc_frag to allocate pages, so page stats in this API
    should also be counted.
    
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-11-30 12:47:10 +02:00
Felix Maurer e4c2252001 net: page_pool: introduce ethtool stats
Bugzilla: https://bugzilla.redhat.com/2120968

commit f3c5264f452a5b0ac1de1f2f657efbabdea3c76a
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Tue Apr 12 18:31:58 2022 +0200

    net: page_pool: introduce ethtool stats
    
    Introduce page_pool APIs to report stats through ethtool and reduce
    duplicated code in each driver.
    
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-11-30 12:47:09 +02:00
Yauheni Kaliuta 3c9b8c39bc page_pool: Add recycle stats to page_pool_put_page_bulk
Bugzilla: https://bugzilla.redhat.com/2120968

commit 590032a4d2133ecc10d3078a8db1d85a4842f12c
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date:   Mon Apr 11 16:05:26 2022 +0200

    page_pool: Add recycle stats to page_pool_put_page_bulk
    
    Add missing recycle stats to page_pool_put_page_bulk routine.
    
    Reviewed-by: Joe Damato <jdamato@fastly.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-11-28 16:48:59 +02:00
Jiri Benc b52705c258 page_pool: Add function to batch and return stats
Bugzilla: https://bugzilla.redhat.com/2120966

commit 6b95e3388b1ea0ca63500c5a6e39162dbf828433
Author: Joe Damato <jdamato@fastly.com>
Date:   Tue Mar 1 23:55:49 2022 -0800

    page_pool: Add function to batch and return stats

    Adds a function page_pool_get_stats which can be used by drivers to obtain
    stats for a specified page_pool.

    Signed-off-by: Joe Damato <jdamato@fastly.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:59 +02:00
Jiri Benc 249dfc0fd8 page_pool: Add recycle stats
Bugzilla: https://bugzilla.redhat.com/2120966

commit ad6fa1e1ab1b8164f1ba296b1b4dc556a483bcad
Author: Joe Damato <jdamato@fastly.com>
Date:   Tue Mar 1 23:55:48 2022 -0800

    page_pool: Add recycle stats

    Add per-cpu stats tracking page pool recycling events:
    	- cached: recycling placed page in the page pool cache
    	- cache_full: page pool cache was full
    	- ring: page placed into the ptr ring
    	- ring_full: page released from page pool because the ptr ring was full
    	- released_refcnt: page released (and not recycled) because refcnt > 1

    Signed-off-by: Joe Damato <jdamato@fastly.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:59 +02:00
Jiri Benc ff85690598 page_pool: Add allocation stats
Bugzilla: https://bugzilla.redhat.com/2120966

commit 8610037e8106b48c79cfe0afb92b2b2466e51c3d
Author: Joe Damato <jdamato@fastly.com>
Date:   Tue Mar 1 23:55:47 2022 -0800

    page_pool: Add allocation stats

    Add per-pool statistics counters for the allocation path of a page pool.
    These stats are incremented in softirq context, so no locking or per-cpu
    variables are needed.

    This code is disabled by default and a kernel config option is provided for
    users who wish to enable them.

    The statistics added are:
    	- fast: successful fast path allocations
    	- slow: slow path order-0 allocations
    	- slow_high_order: slow path high order allocations
    	- empty: ptr ring is empty, so a slow path allocation was forced.
    	- refill: an allocation which triggered a refill of the cache
    	- waive: pages obtained from the ptr ring that cannot be added to
    	  the cache due to a NUMA mismatch.

    Signed-off-by: Joe Damato <jdamato@fastly.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:59 +02:00
Jiri Benc 0700a9a4b5 page_pool: Refactor page_pool to enable fragmenting after allocation
Bugzilla: https://bugzilla.redhat.com/2120966

commit 52cc6ffc0ab2c61a76127b9347567fc97c15582f
Author: Alexander Duyck <alexanderduyck@fb.com>
Date:   Mon Jan 31 08:40:01 2022 -0800

    page_pool: Refactor page_pool to enable fragmenting after allocation

    This change is meant to permit a driver to perform "fragmenting" of the
    page from within the driver instead of the current model which requires
    pre-partitioning the page. The main motivation behind this is to support
    use cases where the page will be split up by the driver after DMA instead
    of before.

    With this change it becomes possible to start using page pool to replace
    some of the existing use cases where multiple references were being used
    for a single page, but the number needed was unknown as the size could be
    dynamic.

    For example, with this code it would be possible to do something like
    the following to handle allocation:
      page = page_pool_alloc_pages();
      if (!page)
        return NULL;
      page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX);
      rx_buf->page = page;
      rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX;

    Then we would process a received buffer by handling it with:
      rx_buf->pagecnt_bias--;

    Once the page has been fully consumed we could then flush the remaining
    instances with:
      if (page_pool_defrag_page(page, rx_buf->pagecnt_bias))
        continue;
      page_pool_put_defragged_page(pool, page -1, !!budget);

    The general idea is that we want to have the ability to allocate a page
    with excess fragment count and then trim off the unneeded fragments.

    Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
    Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Jiri Benc <jbenc@redhat.com>
2022-10-25 14:57:54 +02:00
Felix Maurer 559e95e23d page_pool: remove spinlock in page_pool_refill_alloc_cache()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071620

commit 07b17f0f7485bcbc7902cf6f56a89f5b716344bd
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date:   Fri Jan 7 17:00:42 2022 +0800

    page_pool: remove spinlock in page_pool_refill_alloc_cache()

    As page_pool_refill_alloc_cache() is only called by
    __page_pool_get_cached(), which assumes non-concurrent access
    as suggested by the comment in __page_pool_get_cached(), and
    ptr_ring allows concurrent access between consumer and producer,
    so remove the spinlock in page_pool_refill_alloc_cache().

    Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
2022-08-24 12:53:57 +02:00