JIRA: https://issues.redhat.com/browse/RHEL-14902
commit b7c4f5730a9fa258c8e79f6387a03f3a95c681a2
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Fri Oct 20 16:00:55 2023 +0200
tls: don't reset prot->aad_size and prot->tail_size for TLS_HW
Prior to commit 1a074f7618e8 ("tls: also use init_prot_info in
tls_set_device_offload"), setting TLS_HW on TX didn't touch
prot->aad_size and prot->tail_size. They are set to 0 during context
allocation (tls_prot_info is embedded in tls_context, kzalloc'd by
tls_ctx_create).
When the RX key is configured, tls_set_sw_offload is called (for both
TLS_SW and TLS_HW). If the TX key is configured in TLS_HW mode after
the RX key has been installed, init_prot_info will now overwrite the
correct values of aad_size and tail_size, breaking SW decryption and
causing -EBADMSG errors to be returned to userspace.
Since TLS_HW doesn't use aad_size and tail_size at all (for TLS1.2,
tail_size is always 0, and aad_size is equal to TLS_HEADER_SIZE +
rec_seq_size), we can simply drop this hunk.
Fixes: 1a074f7618e8 ("tls: also use init_prot_info in tls_set_device_offload")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/979d2f89a6a994d5bb49cae49a80be54150d094d.1697653889.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 9f0c8245516bc30cff770c3a69a6baaf8eef8810
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:54 2023 +0200
tls: use fixed size for tls_offload_context_{tx,rx}.driver_state
driver_state is a flex array, but is always allocated by the tls core
to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by
making that size explicit so that sizeof(struct
tls_offload_context_{tx,rx}) works.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 4f4866991847738a216bb5920b3d3902cee13fd0
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:51 2023 +0200
tls: remove tls_context argument from tls_set_device_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
While at it, fix up the reverse xmas tree ordering.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit b6a30ec9239a1fa1a622608176bb78646a539608
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:50 2023 +0200
tls: remove tls_context argument from tls_set_sw_offload
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 1a074f7618e8b82a7cebf45df6e005d2284446ce
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:48 2023 +0200
tls: also use init_prot_info in tls_set_device_offload
Most values are shared. Nonce size turns out to be equal to IV size
for all offloadable ciphers.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 1c1cb3110d7ed2897e65d9a352a8fb709723e057
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:45 2023 +0200
tls: store iv directly within cipher_context
TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit
in cipher_context's size and can simplify the init code a bit.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 6d5029e54700b2427581513c533232b02ce05043
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:43 2023 +0200
tls: store rec_seq directly within cipher_context
TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 8f1d532b4a49e196696b0aa150962d7ce96985e4
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Mon Oct 9 22:50:42 2023 +0200
tls: drop unnecessary cipher_type checks in tls offload
We should never reach tls_device_reencrypt, tls_enc_record, or
tls_enc_skb with a cipher_type that can't be offloaded. Replace those
checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of
hard-coding offloadable cipher types.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 3524dd4d5f1fb9e75fdfaf280822a34fa82059bd
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Fri Aug 25 23:35:15 2023 +0200
tls: expand use of tls_cipher_desc in tls_set_device_offload
tls_set_device_offload is already getting iv and rec_seq sizes from
tls_cipher_desc. We can now also check if the cipher_type coming from
userspace is valid and can be offloaded.
We can also remove the runtime check on rec_seq, since we validate it
at compile time.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 8db44ab26bebe969851468bea6072d9a094b2ace
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Fri Aug 25 23:35:12 2023 +0200
tls: rename tls_cipher_size_desc to tls_cipher_desc
We're going to add other fields to it to fully describe a cipher, so
the "_size" name won't match the contents.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 037303d6760751fdb95ba62cf448ecbc1ac29c98
Author: Sabrina Dubroca <sd@queasysnail.net>
Date: Fri Aug 25 23:35:11 2023 +0200
tls: reduce size of tls_cipher_size_desc
tls_cipher_size_desc indexes ciphers by their type, but we're not
using indices 0..50 of the array. Each struct tls_cipher_size_desc is
20B, so that's a lot of unused memory. We can reindex the array
starting at the lowest used cipher_type.
Introduce the get_cipher_size_desc helper to find the right item and
avoid out-of-bounds accesses, and make tls_cipher_size_desc's size
explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we
add a new cipher.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
commit 6b47808f223c70ff564f9b363446d2a5fa1e05b2
Author: Jakub Kicinski <kuba@kernel.org>
Date: Fri Aug 4 15:59:51 2023 -0700
net: tls: avoid discarding data on record close
TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.
Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.
The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.
We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".
Allocate a single dummy page and use it as fallback.
Found by code inspection, and proven by forcing allocation
failures.
Fixes: e7b159a48b ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-14902
Conflicts:
- skip the funeth changes, driver missing in rhel9
- mlx5: include the changes to mlx5e_ktls_handle_tx_skb that were
skipped while backporting 94ce3b64c62d and never picked up by the
driver maintainer
commit ed3c9a2fcab3b60b0766eb5d7566fd3b10df9a8e
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Jun 13 13:50:06 2023 -0700
net: tls: make the offload check helper take skb not socket
All callers of tls_is_sk_tx_device_offloaded() currently do
an equivalent of:
if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk))
Have the helper accept skb and do the skb->sk check locally.
Two drivers have local static inlines with similar wrappers
already.
While at it change the ifdef condition to TLS_DEVICE.
Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are
equivalent. This makes removing the duplicated IS_ENABLED()
check in funeth more obviously correct.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-7936
Conflicts: RHEL9 still has ->sendpage so we can't take (b848b26c667
net: Kill MSG_SENDPAGE_NOTLAST).
commit c004b0e00c94322a2f82a8b0b7711ed938097774
Author: Hannes Reinecke <hare@suse.de>
Date: Wed Jul 26 21:15:52 2023 +0200
net/tls: handle MSG_EOR for tls_device TX flow
tls_push_data() MSG_MORE, but bails out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of MSG_MORE
this patch adds handling MSG_EOR by treating it as the
absence of MSG_MORE.
Consequently we should return an error when both are set.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230726191556.41714-3-hare@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-12625
Conflicts:
* drivers/net/ethernet/freescale/enetc/enetc.c
- context due to missing 8feb020f92a5 ("net: ethernet: enetc: unlock
XDP_REDIRECT for XDP non-linear buffers")
* drivers/net/ethernet/fungible/funeth/funeth_rx.c
- removed hunk for non-existing file
* drivers/net/ethernet/marvell/mvneta.c
- context due to missing 76a676947b56 ("net: mvneta: update frags bit
before passing the xdp buffer to eBPF layer")
* drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
- adjusted due to missing 27602319e328 ("net/mlx5e: RX, Take shared
info fragment addition into a function")
commit b51f4113ebb02011f0ca86abc3134b28d2071b6a
Author: Yunsheng Lin <linyunsheng@huawei.com>
Date: Thu May 11 09:12:12 2023 +0800
net: introduce and use skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove
it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2219775
Tested: tls selftests
commit eca9bfafee3a0487e59c59201ae14c7594ba940a
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue May 16 18:50:41 2023 -0700
tls: rx: strp: preserve decryption status of skbs when needed
When receive buffer is small we try to copy out the data from
TCP into a skb maintained by TLS to prevent connection from
stalling. Unfortunately if a single record is made up of a mix
of decrypted and non-decrypted skbs combining them into a single
skb leads to loss of decryption status, resulting in decryption
errors or data corruption.
Similarly when trying to use TCP receive queue directly we need
to make sure that all the skbs within the record have the same
status. If we don't the mixed status will be detected correctly
but we'll CoW the anchor, again collapsing it into a single paged
skb without decrypted status preserved. So the "fixup" code will
not know which parts of skb to re-encrypt.
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2219775
Tested: tls selftests
commit b3a03b540e3cf62a255213d084d76d71c02793d5
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue May 16 18:50:36 2023 -0700
tls: rx: device: fix checking decryption status
skb->len covers the entire skb, including the frag_list.
In fact we're guaranteed that rxm->full_len <= skb->len,
so since the change under Fixes we were not checking decrypt
status of any skb but the first.
Note that the skb_pagelen() added here may feel a bit costly,
but it's removed by subsequent fixes, anyway.
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183538
Tested: tls selftests
commit e539a105f947b9db470fec39fe91d85fe737a432
Author: Jakub Kicinski <kuba@kernel.org>
Date: Sat Mar 4 11:26:10 2023 -0800
net: tls: fix device-offloaded sendpage straddling records
Adrien reports that incorrect data is transmitted when a single
page straddles multiple records. We would transmit the same
data in all iterations of the loop.
Reported-by: Adrien Moulin <amoulin@corp.free.fr>
Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr
Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()")
Tested-by: Adrien Moulin <amoulin@corp.free.fr>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2183538
Tested: tls selftests
commit 56e5a6d3aa91ed7b5b231c84180d449ce2313f61
Author: Gal Pressman <gal@nvidia.com>
Date: Tue Sep 20 16:01:49 2022 +0300
net/tls: Support 256 bit keys with TX device offload
Add the missing clause for 256 bit keys in tls_set_device_offload(), and
the needed adjustments in tls_device_fallback.c.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
Conflicts: skipped the mlx5e_ktls_handle_tx_skb bits, we don't have
that code yet, it will come through the driver rebase
commit 94ce3b64c62d4b628cf85cd0d9a370aca8f7e43a
Author: Maxim Mikityanskiy <maximmi@nvidia.com>
Date: Wed Aug 10 11:16:02 2022 +0300
net/tls: Use RCU API to access tls_ctx->netdev
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.
Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.
Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.
The transition to RCU exposes existing issues, fixed by this commit:
1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.
2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.
Fixes: 89df6a8104 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 86b259f6f8880237899fbf4f940303b3987dffa9
Author: Jakub Kicinski <kuba@kernel.org>
Date: Tue Aug 9 10:55:43 2022 -0700
tls: rx: device: bound the frag walk
We can't do skb_walk_frags() on the input skbs, because
the input skbs is really just a pointer to the tcp read
queue. We need to bound the "is decrypted" check by the
amount of data in the message.
Note that the walk in tls_device_reencrypt() is after a
CoW so the skb there is safe to walk. Actually in the
current implementation it can't have frags at all, but
whatever, maybe one day it will.
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit d81c7cdd7a6ddffcc8c00c991e3d6e24db84bd9e
Author: Tariq Toukan <tariqt@nvidia.com>
Date: Mon Aug 1 14:24:44 2022 +0300
net/tls: Remove redundant workqueue flush before destroy
destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 7adc91e0c93901a0eeeea10665d0feb48ffde2d4
Author: Tariq Toukan <tariqt@nvidia.com>
Date: Wed Jul 27 12:43:42 2022 +0300
net/tls: Multi-threaded calls to TX tls_dev_del
Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.
This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate). When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.
Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.
Tested with mlx5 device:
Before: 22141 add/sec, 103 del/sec
After: 11684 add/sec, 11684 del/sec
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 113671b255ee3b9f5585a6d496ef0e675e698698
Author: Tariq Toukan <tariqt@nvidia.com>
Date: Wed Jul 27 12:43:41 2022 +0300
net/tls: Perform immediate device ctx cleanup when possible
TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.
For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 8b3c59a7a0bed6fe365755ac211dcf94fdac81b4
Author: Jakub Kicinski <kuba@kernel.org>
Date: Fri Jul 22 16:50:32 2022 -0700
tls: rx: device: add input CoW helper
Wrap the remaining skb_cow_data() into a helper, so it's easier
to replace down the lane. The new version will change the skb
so make sure relevant pointers get reloaded after the call.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit f6336724a4d4220c89a4ec38bca84b03b178b1a3
Author: Maxim Mikityanskiy <maximmi@nvidia.com>
Date: Thu Jul 21 12:11:27 2022 +0300
net/tls: Remove the context from the list in tls_device_down
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.
Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.
Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 541cc48be3b141e8529fef05ad6cedbca83f9e80
Author: Jakub Kicinski <kuba@kernel.org>
Date: Thu Jul 14 22:22:30 2022 -0700
tls: rx: read the input skb from ctx->recv_pkt
Callers always pass ctx->recv_pkt into decrypt_skb_update(),
and it propagates it to its callees. This may give someone
the false impression that those functions can accept any valid
skb containing a TLS record. That's not the case, the record
sequence number is read from the context, and they can only
take the next record coming out of the strp.
Let the functions get the skb from the context instead of
passing it in. This will also make it cleaner to return
a different skb than ctx->recv_pkt as the decrypted one
later on.
Since we're touching the definition of decrypt_skb_update()
use this as an opportunity to rename it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b
Author: Tariq Toukan <tariqt@nvidia.com>
Date: Fri Jul 15 11:42:16 2022 +0300
net/tls: Fix race in TLS device down flow
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
all the way until it flushes the gc work, and returns without freeing
the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization. This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
Conflicts: tls_sw_recvmsg still has the nonblock argument, missing
commit ec095263a965 ("net: remove noblock parameter from recvmsg()
entities")
commit 5879031423089b2e19b769f30fc618af742264c3
Author: Jakub Kicinski <kuba@kernel.org>
Date: Thu Jul 7 18:03:13 2022 -0700
tls: create an internal header
include/net/tls.h is getting a little long, and is probably hard
for driver authors to navigate. Split out the internals into a
header which will live under net/tls/. While at it move some
static inlines with a single user into the source files, add
a few tls_ prefixes and fix spelling of 'proccess'.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit c1318b39c7d36bd5139a9c71044ff2b2d3c6f9d8
Author: Boris Pismenny <borisp@nvidia.com>
Date: Wed May 18 12:27:31 2022 +0300
tls: Add opt-in zerocopy mode of sendfile()
TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.
In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.
This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.
The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.
The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.
Sockets other than TLS device offload are not affected by the new socket
option. The actual status of zerocopy sendfile can be queried with
sock_diag.
Performance numbers in a single-core test with 24 HTTPS streams on
nginx, under 100% CPU load:
* non-zerocopy: 33.6 Gbit/s
* zerocopy: 79.92 Gbit/s
CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Signed-off-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 3740651bf7e200109dd42d5b2fb22226b26f960a
Author: Maxim Mikityanskiy <maximmi@nvidia.com>
Date: Thu May 12 12:18:30 2022 +0300
tls: Fix context leak on tls_device_down
The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.
Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.
This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.
Fixes: c55dcdd435 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit a0df71948e9548de819a6f1da68f5f1742258a52
Author: Maxim Mikityanskiy <maximmi@nvidia.com>
Date: Tue Apr 26 18:49:49 2022 +0300
tls: Skip tls_append_frag on zero copy size
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.
If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.
This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 71471ca32505afa7c3f7f6a8268716e1ddb81cd4
Author: Jakub Kicinski <kuba@kernel.org>
Date: Thu Apr 7 20:38:23 2022 -0700
tls: hw: rx: use return value of tls_device_decrypted() to carry status
Instead of tls_device poking into internals of the message
return 1 from tls_device_decrypted() if the device handled
the decryption.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit 7dc59c33d62c4520a119051d4486c214ef5caa23
Author: Jakub Kicinski <kuba@kernel.org>
Date: Thu Apr 7 20:38:17 2022 -0700
tls: rx: don't store the decryption status in socket context
Similar justification to previous change, the information
about decryption status belongs in the skb.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
Tested: selftests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2143700
commit b1a6f56b6506c2cecef301b5c3804be656a8c334
Author: Ziyang Xuan <william.xuanziyang@huawei.com>
Date: Sat Mar 19 11:15:20 2022 +0800
net/tls: optimize judgement processes in tls_set_device_offload()
It is known that priority setting HW offload when set tls TX/RX offload
by setsockopt(). Check netdevice whether support NETIF_F_HW_TLS_TX or
not at the later stages in the whole tls_set_device_offload() process,
some memory allocations have been done before that. We must release those
memory and return error if we judge the netdevice not support
NETIF_F_HW_TLS_TX. It is redundant.
Move NETIF_F_HW_TLS_TX judgement forward, and move start_marker_record
and offload_ctx memory allocation back slightly. Thus, we can get
simpler exception handling process.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sdubroca@redhat.com>
This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.
This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.
On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.
The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).
A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
record is being initialized to ctx->open_record but this is never
read as record is overwritten later on. Remove the redundant
initialization.
Cleans up the following clang-analyzer warning:
net/tls/tls_device.c:421:26: warning: Value stored to 'record' during
its initialization is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/beggining/beginning/
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond
interfaces, they do not exist as the interaction is done directly with
the lower device.
Also, make the validate function pass when it's called with the upper
bond interface.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Do not call the tls_dev_ops of upper devices. Instead, ask them
for the proper lowest device and communicate with it directly.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent changes made to remove AES constants started using protocol
aware salt_size. ctx->prot_info's salt_size is filled in tls sw case,
but not in tls offload mode, and was working so far because of the
hard coded value was used.
Fixes: 6942a284fb ("net/tls: make inline helpers protocol-aware")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Link: https://lore.kernel.org/r/20201201090752.27355-1-rohitm@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.
Conflicts:
drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inline functions defined in tls.h have a lot of AES-specific
constants. Remove these constants and change argument to struct
tls_prot_info to have an access to cipher type in later patches
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.
This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In async_resync mode, we log the TCP seq of records until the async request
is completed. Later, in case one of the logged seqs matches the resync
request, we return it, together with its record serial number. Before this
fix, we mistakenly returned the serial number of the current record
instead.
Fixes: ed9b7646b0 ("net/tls: Add asynchronous resync")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>