MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6240
virtio-net: fix overflow inside virtnet_rq_alloc
JIRA: https://issues.redhat.com/browse/RHEL-73638
CVE: CVE-2024-57843
Upstream: Merged
commit 6aacd1484468361d1d04badfe75f264fa5314864
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Tue Oct 29 16:46:12 2024 +0800
virtio-net: fix overflow inside virtnet_rq_alloc
When the frag just got a page, then may lead to regression on VM.
Specially if the sysctl net.core.high_order_alloc_disable value is 1,
then the frag always get a page when do refill.
Which could see reliable crashes or scp failure (scp a file 100M in size
to VM).
The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur.
The commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever
use_dma_api") introduced this problem. And we reverted some commits to
fix this in last linux version. Now we try to enable it and fix this
bug directly.
Here, when the frag size is not enough, we reduce the buffer len to fix
this problem.
Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Approved-by: Laurent Vivier <lvivier@redhat.com>
Approved-by: Eugenio Pérez <eperezma@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Patrick Talbert <ptalbert@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-73638
CVE: CVE-2024-57843
Upstream: Merged
commit 6aacd1484468361d1d04badfe75f264fa5314864
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Tue Oct 29 16:46:12 2024 +0800
virtio-net: fix overflow inside virtnet_rq_alloc
When the frag just got a page, then may lead to regression on VM.
Specially if the sysctl net.core.high_order_alloc_disable value is 1,
then the frag always get a page when do refill.
Which could see reliable crashes or scp failure (scp a file 100M in size
to VM).
The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur.
The commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever
use_dma_api") introduced this problem. And we reverted some commits to
fix this in last linux version. Now we try to enable it and fix this
bug directly.
Here, when the frag size is not enough, we reduce the buffer len to fix
this problem.
Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-68253
CVE: CVE-2024-53082
Upstream: Merged
commit 3f7d9c1964fcd16d02a8a9d4fd6f6cb60c4cc530
Author: Philo Lu <lulie@linux.alibaba.com>
Date: Mon Nov 4 16:57:04 2024 +0800
virtio_net: Add hash_key_length check
Add hash_key_length check in virtnet_probe() to avoid possible out of
bound errors when setting/reading the hash key.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-3230
For rq, we have three cases getting buffers from virtio core:
1. virtqueue_get_buf{,_ctx}
2. virtqueue_detach_unused_buf
3. callback for virtqueue_resize
But in commit 295525e29a5b("virtio_net: merge dma operations when
filling mergeable buffers"), I missed the dma unmap for the #3 case.
That will leak some memory, because I did not release the pages referred
by the unused buffers.
If we do such script, we will make the system OOM.
while true
do
ethtool -G ens4 rx 128
ethtool -G ens4 rx 256
free -m
done
Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20231226094333.47740-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 2311e06b9bf3d44e15f9175af177a782806f688f)
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Conflicts:
in drivers/net/virtio_net.c
Contextual conflict in rs/net/virtio_net.c because we don't have
struct virtio_net_common_hdr downstream introduced by
dae64749db25 (“virtio_net: Introduce skb_vnet_common_hdr to avoid
typecasting”)
JIRA: https://issues.redhat.com/browse/RHEL-3230
Use DEV_STATS_INC() and DEV_STATS_READ() which provide
atomicity on paths that can be used concurrently.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d12a26b74fb77434b73fe39022266c4b00907219)
Signed-off-by: Eric Auger <eric.auger@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-3230
Commit 295525e29a5b ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.
This patch does cpu sync regardless of whether the ref is zero or not.
Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers")
Reported-by: Michael Roth <michael.roth@amd.com>
Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5720c43d5216b5dbd9ab25595f7c61e55d36d4fc)
Signed-off-by: Eric Auger <eric.auger@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-3230
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.
This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.
This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.
kernel command line: intel_iommu=on iommu.passthrough=0
| strict=0 | strict=1
Before | 775496pps | 428614pps
After | 1109316pps | 742853pps
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 295525e29a5b5694a6e96864f0c1365f79639863)
Signed-off-by: Eric Auger <eric.auger@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31889
Conflicts:
- drivers/net/ethernet/fungible/funeth/funeth_ethtool.c
hunk removed as the file does not exist in RHEL
commit fb6e30a72539ce28c1323aef4190d35aac106f6f
Author: Ahmed Zaki <ahmed.zaki@intel.com>
Date: Tue Dec 12 17:33:14 2023 -0700
net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.
This is part 1/2 of the fix, as suggested in [1]:
- First simplify the code by always providing a pointer to all params
(indir, key and func); the fact that some of them may be NULL seems
like a weird historic thing or a premature optimization.
It will simplify the drivers if all pointers are always present.
- Then make the functions take a dev pointer, and a pointer to a
single struct wrapping all arguments. The set_* should also take
an extack.
Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1]
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-31916
Conflicts:
* include/linux/netdevice.h
Adjusted due to KABI reservations made by RHEL
commit 3b3a52715a ("net: exclude BPF/XDP from kABI")
commit 49e47a5b6145d86c30022fe0e949bbb24bae28ba
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Aug 2 18:02:29 2023 -0700
net: move struct netdev_rx_queue out of netdevice.h
struct netdev_rx_queue is touched in only a few places
and having it defined in netdevice.h brings in the dependency
on xdp.h, because struct xdp_rxq_info gets embedded in
struct netdev_rx_queue.
In prep for removal of xdp.h from netdevice.h move all
the netdev_rx_queue stuff to a new header.
We could technically break the new header up to avoid
the sysfs.h include but it's so rarely included it
doesn't seem to be worth it at this point.
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12625
Conflicts:
* drivers/net/ethernet/freescale/enetc/enetc.c
- context due to missing 8feb020f92a5 ("net: ethernet: enetc: unlock
XDP_REDIRECT for XDP non-linear buffers")
* drivers/net/ethernet/fungible/funeth/funeth_rx.c
- removed hunk for non-existing file
* drivers/net/ethernet/marvell/mvneta.c
- context due to missing 76a676947b56 ("net: mvneta: update frags bit
before passing the xdp buffer to eBPF layer")
* drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
- adjusted due to missing 27602319e328 ("net/mlx5e: RX, Take shared
info fragment addition into a function")
commit b51f4113ebb02011f0ca86abc3134b28d2071b6a
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date: Thu May 11 09:12:12 2023 +0800
net: introduce and use skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove
it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit accc1bf23068c1cdc4c2b015320ba856e210dd98
Author: Brett Creeley <brett.creeley@amd.com>
Date: Mon Jun 5 12:59:25 2023 -0700
virtio_net: use control_buf for coalesce params
Commit 699b045a8e43 ("net: virtio_net: notifications coalescing
support") added coalescing command support for virtio_net. However,
the coalesce commands are using buffers on the stack, which is causing
the device to see DMA errors. There should also be a complaint from
check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using
coalesce params from the control_buf struct, which aligns with other
commands.
Cc: stable@vger.kernel.org
Fixes: 699b045a8e43 ("net: virtio_net: notifications coalescing support")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 5306623a9826aa7d63b32c6a3803c798a765474d
Author: Feng Liu <feliu@nvidia.com>
Date: Fri May 12 11:18:12 2023 -0400
virtio_net: Fix error unwinding of XDP initialization
When initializing XDP in virtnet_open(), some rq xdp initialization
may hit an error causing net device open failed. However, previous
rqs have already initialized XDP and enabled NAPI, which is not the
expected behavior. Need to roll back the previous rq initialization
to avoid leaks in error unwinding of init code.
Also extract helper functions of disable and enable queue pairs.
Use newly introduced disable helper function in error unwinding and
virtnet_close. Use enable helper function in virtnet_open.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 21e26a71f5d3cae2551abcda318263a6a8c15206
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:17 2023 +0800
virtio_net: introduce virtnet_build_skb()
This logic is used in multiple places, now we separate it into
a helper.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit aef76506bc64bbf567490cbe437c26f1aadeee90
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:15 2023 +0800
virtio_net: small: remove skip_xdp
Because the skb build code is not shared between xdp and non-xdp, and
the xdp code in receive_small() is simpler, so "skip_xdp" is not needed.
We can remove it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 7af70fc169bd254aea780115b0f355956f84902b
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:14 2023 +0800
virtio_net: small: avoid code duplication in xdp scenarios
Avoid the problem that some variables(headroom and so on) will repeat
the calculation when process xdp.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit fc8ce84b09bcc7306f3128f783967ecbfa617207
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:13 2023 +0800
virtio_net: small: remove the delta
In the case of XDP-PASS, skb_reserve uses the "delta" to compatible
non-XDP, now that is not shared between xdp and non-xdp, so we can
remove this logic.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit c5f3e72f04c02cc2a0671adbb16224e61dc4bd8a
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:12 2023 +0800
virtio_net: introduce receive_small_xdp()
The purpose of this patch is to simplify the receive_small().
Separate all the logic of XDP of small into a function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 59ba3b1a88a8251d8fd0ef847afb59aa83a47126
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:11 2023 +0800
virtio_net: merge: remove skip_xdp
Now, the logic of merge xdp process is simple, we can remove the
skip_xdp.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit d8f2835a4746f26523cb512dc17e2b0a00dd31a9
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:10 2023 +0800
virtio_net: introduce receive_mergeable_xdp()
The purpose of this patch is to simplify the receive_mergeable().
Separate all the logic of XDP into a function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 4cb00b13c064088352a4f2ca4a8279010ad218a8
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:09 2023 +0800
virtio_net: virtnet_build_xdp_buff_mrg() auto release xdp shinfo
virtnet_build_xdp_buff_mrg() auto release xdp shinfo then the caller no
need to careful the xdp shinfo.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 80f50f918c6e2d3f46aae717e6b04271298ab1c0
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:08 2023 +0800
virtio_net: separate the logic of freeing the rest mergeable buf
This patch introduce a new function that frees the rest mergeable buf.
The subsequent patch will reuse this function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit bb2c1e9e75be4a059fa795aac58b805091c9dae7
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:07 2023 +0800
virtio_net: separate the logic of freeing xdp shinfo
This patch introduce a new function that releases the
xdp shinfo. The subsequent patch will reuse this function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 00765f8ed74240419091a1708c195405b55fe243
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:06 2023 +0800
virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp
At present, we have two similar logic to perform the XDP prog.
Therefore, this patch separates the code of executing XDP, which is
conducive to later maintenance.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit dbe4fec2447dd215964aad88b0e06f96c6958ee9
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:05 2023 +0800
virtio_net: optimize mergeable_xdp_get_buf()
The previous patch, in order to facilitate review, I do not do any
modification. This patch has made some optimization on the top.
* remove some repeated logics in this function.
* add fast check for passing without any alloc.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit ad4858beb824aeba53deeae660ea7fab9e624bc0
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:04 2023 +0800
virtio_net: introduce mergeable_xdp_get_buf()
Separating the logic of preparation for xdp from receive_mergeable.
The purpose of this is to simplify the logic of execution of XDP.
The main logic here is that when headroom is insufficient, we need to
allocate a new page and calculate offset. It should be noted that if
there is new page, the variable page will refer to the new page.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 363d8ce4b94719a87dad865e2829f1dba0f7ef71
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Mon May 8 14:14:03 2023 +0800
virtio_net: mergeable xdp: put old page immediately
In the xdp implementation of virtio-net mergeable, it always checks
whether two page is used and a page is selected to release. This is
complicated for the processing of action, and be careful.
In the entire process, we have such principles:
* If xdp_page is used (PASS, TX, Redirect), then we release the old
page.
* If it is a drop case, we will release two. The old page obtained from
buf is release inside err_xdp, and xdp_page needs be relased by us.
But in fact, when we allocate a new page, we can release the old page
immediately. Then just one is using, we just need to release the new
page for drop case. On the drop path, err_xdp will release the variable
"page", so we only need to let "page" point to the new xdp_page in
advance.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit f8bb5104394560e29017c25bcade4c6b7aabd108
Author: Wenliang Wang <wangwenliang.1995@bytedance.com>
Date: Thu May 4 10:27:06 2023 +0800
virtio_net: suppress cpu stall when free_unused_bufs
For multi-queue and large ring-size use case, the following error
occurred when free_unused_bufs:
rcu: INFO: rcu_sched self-detected stall on CPU.
Fixes: 986a4f4d45 ("virtio_net: multiqueue support")
Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 853618d5886bf94812f31228091cd37d308230f7
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Fri Apr 14 14:08:35 2023 +0800
virtio_net: bugfix overflow inside xdp_linearize_page()
Here we copy the data from the original buf to the new page. But we
not check that it may be overflow.
As long as the size received(including vnethdr) is greater than 3840
(PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.
And this is completely possible, as long as the MTU is large, such
as 4096. In our test environment, this will cause crash. Since crash is
caused by the written memory, it is meaningless, so I do not include it.
Fixes: 72979a6c35 ("virtio_net: xdp, add slowpath case for non contiguous buffers")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 1a3bd6eabae35afc5c6dbe2651f21467cf8ad3fd
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Wed Mar 15 09:52:23 2023 +0800
virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() fails
build_skb_from_xdp_buff() may return NULL, in this case
we need to free the frags of xdp shinfo.
Fixes: fab89bafa95b ("virtio-net: support multi-buffer xdp")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit be50da3e9d4ad1958f7b11322d44d94d5c25a4c1
Author: Jiri Pirko <jiri@nvidia.com>
Date: Thu Mar 9 10:45:59 2023 +0100
net: virtio_net: implement exact header length guest feature
Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when
set implicates that device benefits from knowing the exact size
of the header. For compatibility, to signal to the device that
the header is reliable driver also needs to set this feature.
Without this feature set by driver, device has to figure
out the header size itself.
Quoting the original virtio spec:
"hdr_len is a hint to the device as to how much of the header needs to
be kept to copy into each packet"
"a hint" might not be clear for the reader what does it mean, if it is
"maybe like that" of "exactly like that". This feature just makes it
crystal clear and let the device count on the hdr_len being filled up
by the exact length of header.
Also note the spec already has following note about hdr_len:
"Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size."
Without this feature the device needs to parse the header in core
data path handling. Accurate information helps the device to eliminate
such header parsing and directly use the hardware accelerators
for GSO operation.
virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb).
The driver already complies to fill the correct value. Introduce the
feature and advertise it.
Note that virtio spec also includes following note for device
implementation:
"Caution should be taken by the implementation so as to prevent
a malicious driver from attacking the device by setting
an incorrect hdr_len."
There is a plan to support this feature in our emulated device.
A device of SolidRun offers this feature bit. They claim this feature
will save the device a few cycles for every GSO packet.
Link: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-230006x3
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230309094559.917857-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit cd1c604aa1d8c641f5edcb58b76352d4eba06ec1
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Wed Mar 8 10:49:35 2023 +0800
virtio_net: add checking sq is full inside xdp xmit
If the queue of xdp xmit is not an independent queue, then when the xdp
xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter
the following error.
net ens4: Unexpected TXQ (0) queue failure: -28
This patch adds a check whether sq is full in xdp xmit.
Fixes: 56434a01b1 ("virtio_net: add XDP_TX support")
Reported-by: Yichun Zhang <yichun@openresty.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit b8ef4809bc7faa22e63de921ef56de21ed191af0
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Wed Mar 8 10:49:34 2023 +0800
virtio_net: separate the logic of checking whether sq is full
Separate the logic of checking whether sq is full. The subsequent patch
will reuse this func.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 25074a44ac4e5dae5b4a25dcb9bbfcbd00f15ae2
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Wed Mar 8 10:49:33 2023 +0800
virtio_net: reorder some funcs
The purpose of this is to facilitate the subsequent addition of new
functions without introducing a separate declaration.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 27369c9c2b722617063d6b80c758ab153f1d95d4
Author: Parav Pandit <parav@nvidia.com>
Date: Fri Feb 3 15:37:38 2023 +0200
virtio-net: Maintain reverse cleanup order
To easily audit the code, better to keep the device stop()
sequence to be mirror of the device open() sequence.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 63b114042d8a9c02d9939889177c36dbdb17a588
Author: Parav Pandit <parav@nvidia.com>
Date: Thu Feb 2 18:35:16 2023 +0200
virtio-net: Keep stop() to follow mirror sequence of open()
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.
Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 981f14d42a7f1610292a1ef0f7cd00138fff361d
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Tue Jan 31 16:50:04 2023 +0800
virtio-net: fix possible unsigned integer overflow
When the single-buffer xdp is loaded and after xdp_linearize_page()
is called, *num_buf becomes 0 and (*num_buf - 1) may overflow into
a large integer in virtnet_build_xdp_buff_mrg(), resulting in
unexpected packet dropping.
Fixes: ef75cb51f139 ("virtio-net: build xdp_buff with multi buffers")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20230131085004.98687-1-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit ad7e615f646c9b5b2cf655cdfb9d91a28db4f25a
Author: Magnus Karlsson <magnus.karlsson@intel.com>
Date: Wed Jan 25 08:48:59 2023 +0100
virtio-net: execute xdp_do_flush() before napi_complete_done()
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: 186b3c998c ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit d0671115869d19ec76d658c4bf86d3211a8ea121
Author: Parav Pandit <parav@nvidia.com>
Date: Mon Jan 23 05:55:11 2023 +0200
virtio-net: Reduce debug name field size to 16 bytes
virtio queue index can be maximum of 65535. 16 bytes are enough to store
the vq name with the existing string prefix.
With this change, send queue struct saves 24 bytes and receive
queue saves whole cache line worth 64 bytes per structure
due to saving in alignment bytes.
Pahole results before:
pahole -s drivers/net/virtio_net.o | \
grep -e "send_queue" -e "receive_queue"
send_queue 1112 0
receive_queue 1280 1
Pahole results after:
pahole -s drivers/net/virtio_net.o | \
grep -e "send_queue" -e "receive_queue"
send_queue 1088 0
receive_queue 1216 1
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit d71ebe8114b4bf622804b810f5e274069060a174
Author: Jason Wang <jasowang@redhat.com>
Date: Tue Jan 17 11:47:07 2023 +0800
virtio-net: correctly enable callback during start_xmit
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
virtqueue callback via the following statement:
do {
if (use_napi)
virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, false);
} while (use_napi && kick &&
unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
When NAPI is used and kick is false, the callback won't be enabled
here. And when the virtqueue is about to be full, the tx will be
disabled, but we still don't enable tx interrupt which will cause a TX
hang. This could be observed when using pktgen with burst enabled.
TO be consistent with the logic that tries to disable cb only for
NAPI, fixing this by trying to enable delayed callback only when NAPI
is enabled when the queue is about to be full.
Fixes: a7766ef18b ("virtio_net: disable cb aggressively")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit eb1d929f1551f226f59e38465c542df9071166d6
Author: Parav Pandit <parav@nvidia.com>
Date: Mon Jan 16 22:27:08 2023 +0200
virtio_net: Reuse buffer free function
virtnet_rq_free_unused_buf() helper function to free the buffer
already exists. Avoid code duplication by reusing existing function.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit fab89bafa95b6f333b0e2dca0ae07a0b3600e954
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:29 2023 +0800
virtio-net: support multi-buffer xdp
Driver can pass the skb to stack by build_skb_from_xdp_buff().
Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 18117a842ab029df139f776bf31eebff6d0e0730
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:28 2023 +0800
virtio-net: remove xdp related info from page_to_skb()
For the clear construction of xdp_buff, we remove the xdp processing
interleaved with page_to_skb(). Now, the logic of xdp and building
skb from xdp are separate and independent.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit b26aa481b4b710051dd663c89ed31f705a7a67eb
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:27 2023 +0800
virtio-net: build skb from multi-buffer xdp
This converts the xdp_buff directly to a skb, including
multi-buffer and single buffer xdp. We'll isolate the
construction of skb based on xdp from page_to_skb().
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 97717e8dbda1dede65c4df12891332502df632f3
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:26 2023 +0800
virtio-net: transmit the multi-buffer xdp
This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 22174f79a44baf5e46faafff1d7b21363431b93a
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:25 2023 +0800
virtio-net: construct multi-buffer xdp in mergeable
Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
For the prefilled buffer before xdp is set, we will probably use
vq reset in the future. At the same time, virtio net currently
uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
the tailroom of the last frag, which will involve the offset of the
corresponding page and cause a negative value, so we disable tail
increase by not setting xdp_rxq->frag_size.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit ef75cb51f13941cf7633227ade43c4d86ecbc336
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:24 2023 +0800
virtio-net: build xdp_buff with multi buffers
Support xdp for multi buffer packets in mergeable mode.
Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.
Let 'truesize' return to its literal meaning, that is, when
xdp is set, it includes the length of headroom and tailroom.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-346
commit 50bd14bc98fa0c86ea1e688d93ef1ffe8f1572a0
Author: Heng Qi <hengqi@linux.alibaba.com>
Date: Sat Jan 14 16:22:23 2023 +0800
virtio-net: update bytes calculation for xdp_frame
Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>