Commit Graph

328 Commits

Author SHA1 Message Date
Davide Caratti cc733700c9 mptcp: pm: ADD_ADDR 0 is not a new address
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit 57f86203b41c98b322119dfdbb1ec54ce5e3369b

commit 57f86203b41c98b322119dfdbb1ec54ce5e3369b
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Aug 28 08:14:37 2024 +0200

    mptcp: pm: ADD_ADDR 0 is not a new address

    The ADD_ADDR 0 with the address from the initial subflow should not be
    considered as a new address: this is not something new. If the host
    receives it, it simply means that the address is available again.

    When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
    it as new by not incrementing the 'add_addr_accepted' counter. But the
    'accept_addr' might not be set if the limit has already been reached:
    this can be bypassed in this case. But before, it is important to check
    that this ADD_ADDR for the ID 0 is for the same address as the initial
    subflow. If not, it is not something that should happen, and the
    ADD_ADDR can be ignored.

    Note that if an ADD_ADDR is received while there is already a subflow
    opened using the same address, this ADD_ADDR is ignored as well. It
    means that if multiple ADD_ADDR for ID 0 are received, there will not be
    any duplicated subflows created by the client.

    Fixes: d0876b2284 ("mptcp: add the incoming RM_ADDR support")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:19:01 +01:00
Davide Caratti 2af9843d3b mptcp: avoid duplicated SUB_CLOSED events
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit d82809b6c5f2676b382f77a5cbeb1a5d91ed2235

commit d82809b6c5f2676b382f77a5cbeb1a5d91ed2235
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Aug 28 08:14:35 2024 +0200

    mptcp: avoid duplicated SUB_CLOSED events

    The initial subflow might have already been closed, but still in the
    connection list. When the worker is instructed to close the subflows
    that have been marked as closed, it might then try to close the initial
    subflow again.

     A consequence of that is that the SUB_CLOSED event can be seen twice:

      # ip mptcp endpoint
      1.1.1.1 id 1 subflow dev eth0
      2.2.2.2 id 2 subflow dev eth1

      # ip mptcp monitor &
      [         CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
      [     ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
      [  SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9

      # ip mptcp endpoint delete id 1
      [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
      [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9

    The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
    second one from __mptcp_close_subflow().

    To avoid doing the post-closed processing twice, the subflow is now
    marked as closed the first time.

    Note that it is not enough to check if we are dealing with the first
    subflow and check its sk_state: the subflow might have been reset or
    closed before calling mptcp_close_ssk().

    Fixes: b911c97c7d ("mptcp: add netlink event support")
    Cc: stable@vger.kernel.org
    Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:19:01 +01:00
Davide Caratti e1742e5983 mptcp: pr_debug: add missing \n at the end
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit cb41b195e634d3f1ecfcd845314e64fd4bb3c7aa
Conflicts:
 - net/mptcp/protocol.c: preserve the old version of inet_csk_accept()
   as we don't have upstream commit 92ef0fd55ac8 ("net: change proto and
   proto_ops accept type")

commit cb41b195e634d3f1ecfcd845314e64fd4bb3c7aa
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 26 19:11:21 2024 +0200

    mptcp: pr_debug: add missing \n at the end

    pr_debug() have been added in various places in MPTCP code to help
    developers to debug some situations. With the dynamic debug feature, it
    is easy to enable all or some of them, and asks users to reproduce
    issues with extra debug.

    Many of these pr_debug() don't end with a new line, while no 'pr_cont()'
    are used in MPTCP code. So the goal was not to display multiple debug
    messages on one line: they were then not missing the '\n' on purpose.
    Not having the new line at the end causes these messages to be printed
    with a delay, when something else needs to be printed. This issue is not
    visible when many messages need to be printed, but it is annoying and
    confusing when only specific messages are expected, e.g.

      # echo "func mptcp_pm_add_addr_echoed +fmp" \
            > /sys/kernel/debug/dynamic_debug/control
      # ./mptcp_join.sh "signal address"; \
            echo "$(awk '{print $1}' /proc/uptime) - end"; \
            sleep 5s; \
            echo "$(awk '{print $1}' /proc/uptime) - restart"; \
            ./mptcp_join.sh "signal address"
      013 signal address
          (...)
      10.75 - end
      15.76 - restart
      013 signal address
      [  10.367935] mptcp:mptcp_pm_add_addr_echoed: MPTCP: msk=(...)
          (...)

      => a delay of 5 seconds: printed with a 10.36 ts, but after 'restart'
         which was printed at the 15.76 ts.

    The 'Fixes' tag here below points to the first pr_debug() used without
    '\n' in net/mptcp. This patch could be split in many small ones, with
    different Fixes tag, but it doesn't seem worth it, because it is easy to
    re-generate this patch with this simple 'sed' command:

      git grep -l pr_debug -- net/mptcp |
        xargs sed -i "s/\(pr_debug(\".*[^n]\)\(\"[,)]\)/\1\\\n\2/g"

    So in case of conflicts, simply drop the modifications, and launch this
    command.

    Fixes: f870fa0b57 ("mptcp: Add MPTCP socket stubs")
    Cc: stable@vger.kernel.org
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-4-905199fe1172@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:19:01 +01:00
Davide Caratti 543ced49ac mptcp: pm: remove mptcp_pm_remove_subflow()
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit f448451aa62d54be16acb0034223c17e0d12bc69

commit f448451aa62d54be16acb0034223c17e0d12bc69
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:25 2024 +0200

    mptcp: pm: remove mptcp_pm_remove_subflow()

    This helper is confusing. It is in pm.c, but it is specific to the
    in-kernel PM and it cannot be used by the userspace one. Also, it simply
    calls one in-kernel specific function with the PM lock, while the
    similar mptcp_pm_remove_addr() helper requires the PM lock.

    What's left is the pr_debug(), which is not that useful, because a
    similar one is present in the only function called by this helper:

      mptcp_pm_nl_rm_subflow_received()

    After these modifications, this helper can be marked as 'static', and
    the lock can be taken only once in mptcp_pm_flush_addrs_and_subflows().

    Note that it is not a bug fix, but it will help backporting the
    following commits.

    Fixes: 0ee4261a36 ("mptcp: implement mptcp_pm_remove_subflow")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-7-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:19:00 +01:00
Davide Caratti 47b04e47f7 mptcp: pm: fix backup support in signal endpoints
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit 6834097fc38c5416701c793da94558cea49c0a1f
Conflicts:
  - net/mptcp/protocol.h: context mismatch because of missing upstream
    commit 9ae7846c4b6b ("mptcp: dump addrs in userspace pm list")

commit 6834097fc38c5416701c793da94558cea49c0a1f
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Sat Jul 27 12:01:28 2024 +0200

    mptcp: pm: fix backup support in signal endpoints

    There was a support for signal endpoints, but only when the endpoint's
    flag was changed during a connection. If an endpoint with the signal and
    backup was already present, the MP_JOIN reply was not containing the
    backup flag as expected.

    That's confusing to have this inconsistent behaviour. On the other hand,
    the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was
    already there, it was just never set before. Now when requesting the
    local ID from the path-manager, the backup status is also requested.

    Note that when the userspace PM is used, the backup flag can be set if
    the local address was already used before with a backup flag, e.g. if
    the address was announced with the 'backup' flag, or a subflow was
    created with the 'backup' flag.

    Fixes: 4596a2c1b7 ("mptcp: allow creating non-backup subflows")
    Cc: stable@vger.kernel.org
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:18:59 +01:00
Davide Caratti 228314749e mptcp: distinguish rcv vs sent backup flag in requests
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit efd340bf3d7779a3a8ec954d8ec0fb8a10f24982

commit efd340bf3d7779a3a8ec954d8ec0fb8a10f24982
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Sat Jul 27 12:01:24 2024 +0200

    mptcp: distinguish rcv vs sent backup flag in requests

    When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow
    as 'backup' by setting the flag with the same name. Before this patch,
    the backup was set if the other peer set it in its MP_JOIN + SYN
    request.

    It is not correct: the backup flag should be set in the MPJ+SYN+ACK only
    if the host asks for it, and not mirroring what was done by the other
    peer. It is then required to have a dedicated bit for each direction,
    similar to what is done in the subflow context.

    Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:18:59 +01:00
Davide Caratti 89f3742fd2 mptcp: move mptcp_pm_gen.h's include
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit 76a86686e3f0ca68b555131ceefa141a57340ed0

commit 76a86686e3f0ca68b555131ceefa141a57340ed0
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon May 13 18:13:31 2024 -0700

    mptcp: move mptcp_pm_gen.h's include

    Nothing from protocol.h depends on mptcp_pm_gen.h, only code from
    pm_netlink.c and pm_userspace.c depends on it.

    So this include can be moved where it is needed to avoid a "unused
    includes" warning.

    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20240514011335.176158-8-martineau@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:18:59 +01:00
Davide Caratti 7768261cdf mptcp: fix full TCP keep-alive support
JIRA: https://issues.redhat.com/browse/RHEL-62871
Upstream Status: net.git commit bd11dc4fb969ec148e50cd87f88a78246dbc4d0b

commit bd11dc4fb969ec148e50cd87f88a78246dbc4d0b
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon May 13 18:13:26 2024 -0700

    mptcp: fix full TCP keep-alive support

    SO_KEEPALIVE support has been added a while ago, as part of a series
    "adding SOL_SOCKET" support. To have a full control of this keep-alive
    feature, it is important to also support TCP_KEEP* socket options at the
    SOL_TCP level.

    Supporting them on the setsockopt() part is easy, it is just a matter of
    remembering each value in the MPTCP sock structure, and calling
    tcp_sock_set_keep*() helpers on each subflow. If the value is not
    modified (0), calling these helpers will not do anything. For the
    getsockopt() part, the corresponding value from the MPTCP sock structure
    or the default one is simply returned. All of this is very similar to
    other TCP_* socket options supported by MPTCP.

    It looks important for kernels supporting SO_KEEPALIVE, to also support
    TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
    the former is supported, the latter ones will be supported as well. But
    also, not having this simple and isolated change is preventing MPTCP
    support in some apps, and libraries like GoLang [1]. This is why this
    patch is seen as a fix.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
    Fixes: 1b3e7ede13 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
    Link: https://github.com/golang/go/issues/56539 [1]
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20240514011335.176158-3-martineau@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-11-12 10:18:58 +01:00
Antoine Tenart 289cc4c6aa mptcp: introducing a helper into active reset logic
JIRA: https://issues.redhat.com/browse/RHEL-48648
Upstream Status: linux.git
Conflicts:\
- Context difference due to missing upstream commit d5dfbfa2f88e
  ("mptcp: drop duplicate header inclusions") in c9s.

commit 215d40248bde5562a21e4c6cdeaeca0495c9365a
Author: Jason Xing <kernelxing@tencent.com>
Date:   Thu Apr 25 11:13:39 2024 +0800

    mptcp: introducing a helper into active reset logic

    Since we have mapped every mptcp reset reason definition in enum
    sk_rst_reason, introducing a new helper can cover some missing places
    where we have already set the subflow->reset_reason.

    Note: using SK_RST_REASON_NOT_SPECIFIED is the same as
    SK_RST_REASON_MPTCP_RST_EUNSPEC. They are both unknown. So we can convert
    it directly.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-07-16 17:29:41 +02:00
Antoine Tenart ce222ff93a mptcp: support rstreason for passive reset
JIRA: https://issues.redhat.com/browse/RHEL-48648
Upstream Status: linux.git

commit 3e140491dd80d8643261a21efde3ce2ff6fb9fdf
Author: Jason Xing <kernelxing@tencent.com>
Date:   Thu Apr 25 11:13:38 2024 +0800

    mptcp: support rstreason for passive reset

    It relies on what reset options in the skb are as rfc8684 says. Reusing
    this logic can save us much energy. This patch replaces most of the prior
    NOT_SPECIFIED reasons.

    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-07-16 17:29:41 +02:00
Paolo Abeni 8bedbddae5 mptcp: implement TCP_NOTSENT_LOWAT support
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit 29b5e5ef87397963ca38d3eec0d296ad1c979bbc
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Mar 1 18:43:46 2024 +0100

    mptcp: implement TCP_NOTSENT_LOWAT support

    Add support for such socket option storing the user-space provided
    value in a new msk field, and using such data to implement the
    _mptcp_stream_memory_free() helper, similar to the TCP one.

    To avoid adding more indirect calls in the fast path, open-code
    a variant of sk_stream_memory_free() in mptcp_sendmsg() and add
    direct calls to the mptcp stream memory free helper where possible.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:14:43 +02:00
Paolo Abeni 1e769f2a59 mptcp: cleanup writer wake-up
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit 037db6ea57da7a134a8183dead92d64ef92babee
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Mar 1 18:43:44 2024 +0100

    mptcp: cleanup writer wake-up

    After commit 5cf92bbadc ("mptcp: re-enable sndbuf autotune"), the
    MPTCP_NOSPACE bit is redundant: it is always set and cleared together with
    SOCK_NOSPACE.

    Let's drop the first and always relay on the latter, dropping a bunch
    of useless code.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:14:11 +02:00
Paolo Abeni bca5c4b24f mptcp: check the protocol in mptcp_sk() with DEBUG_NET
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit 14d29ec5302caac945267b9586fad01ecddc700c
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Fri Feb 23 21:17:56 2024 +0100

    mptcp: check the protocol in mptcp_sk() with DEBUG_NET

    Fuzzers and static checkers might not detect when mptcp_sk() is used
    with a non mptcp_sock structure.

    This is similar to the parent commit, where it is easy to use mptcp_sk()
    with a TCP sock, e.g. with a subflow sk.

    So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
    kernel devs when a non-MPTCP socket is being used as an MPTCP one.
    'mptcp_sk()' macro is then defined differently: with an extra WARN to
    complain when an unexpected socket is being used.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-4-b6c8a10396bd@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:14:01 +02:00
Paolo Abeni 7f93f0d2fe mptcp: check the protocol in tcp_sk() with DEBUG_NET
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit dcc03f270d1e2f4b9715537d8deb734bd019e187
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Fri Feb 23 21:17:55 2024 +0100

    mptcp: check the protocol in tcp_sk() with DEBUG_NET

    Fuzzers and static checkers might not detect when tcp_sk() is used with
    a non tcp_sock structure.

    This kind of mistake already happened a few times with MPTCP: when
    wrongly using TCP-specific helpers with mptcp_sock pointers. On the
    other hand, there are many 'tcp_xxx()' helpers that are taking a 'struct
    sock' pointer as arguments, and some of them are only looking at fields
    from 'struct sock', and nothing from 'struct tcp_sock'. It is then
    tempting to use them with a 'struct mptcp_sock'.

    So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell
    kernel devs when a non-TCP socket is being used as a TCP one. 'tcp_sk()'
    macro is then re-defined to add a WARN when an unexpected socket is
    being used.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-3-b6c8a10396bd@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:13:38 +02:00
Paolo Abeni 3ee96e4d72 mptcp: annotate lockless access for the tx path
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit d440a4e27acdede686b974b62a6b2b2bd7914437
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Feb 2 12:40:08 2024 +0100

    mptcp: annotate lockless access for the tx path

    The mptcp-level TX path info (write_seq, bytes_sent, snd_nxt) are under
    the msk socket lock protection, and are accessed lockless in a few spots.

    Always mark the write operations with WRITE_ONCE, read operations
    outside the lock with READ_ONCE and drop the annotation for read
    under such lock.

    To simplify the annotations move mptcp_pending_data_fin_ack() from
    __mptcp_data_acked() to __mptcp_clean_una(), under the msk socket
    lock, where such call would belong.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:12:26 +02:00
Paolo Abeni 84e68e9b9b mptcp: annotate access for msk keys
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit 1c09d7cbb57abcea66148923cef717cc7ab35704
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Feb 2 12:40:07 2024 +0100

    mptcp: annotate access for msk keys

    Both the local and the remote key follow the same locking
    schema, put in place the proper ONCE accessors.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 17:12:17 +02:00
Paolo Abeni 3fdd6eb96a mptcp: add CurrEstab MIB counter support
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit d9cd27b8cd191133e287e5de107f971136abe8a2
Author: Geliang Tang <geliang.tang@linux.dev>
Date:   Fri Dec 22 13:47:22 2023 +0100

    mptcp: add CurrEstab MIB counter support

    Add a new MIB counter named MPTCP_MIB_CURRESTAB to count current
    established MPTCP connections, similar to TCP_MIB_CURRESTAB. This is
    useful to quickly list the number of MPTCP connections without having to
    iterate over all of them.

    This patch adds a new helper function mptcp_set_state(): if the state
    switches from or to ESTABLISHED state, this newly added counter is
    incremented. This helper is going to be used in the following patch.

    Similar to MPTCP_INC_STATS(), a new helper called MPTCP_DEC_STATS() is
    also needed to decrement a MIB counter.

    Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 16:54:09 +02:00
Paolo Abeni 0ce1a943ac mptcp: add mptcpi_subflows_total counter
JIRA: https://issues.redhat.com/browse/RHEL-28492
Tested: LNST, Tier1

Upstream commit:
commit 6ebf6f90ab4ac09a76172a6d387e8819d3259595
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Nov 28 15:18:45 2023 -0800

    mptcp: add mptcpi_subflows_total counter

    If the initial subflow has been removed, we cannot know without checking
    other counters, e.g. ss -ti <filter> | grep -c tcp-ulp-mptcp or
    getsockopt(SOL_MPTCP, MPTCP_FULL_INFO, ...) (or others except MPTCP_INFO
    of course) and then check mptcp_subflow_data->num_subflows to get the
    total amount of subflows.

    This patch adds a new counter mptcpi_subflows_total in mptcpi_flags to
    store the total amount of subflows, including the initial one. A new
    helper __mptcp_has_initial_subflow() is added to check whether the
    initial subflow has been removed or not. With this helper, we can then
    compute the total amount of subflows from mptcp_info by doing something
    like:

        mptcpi_subflows_total = mptcpi_subflows +
                __mptcp_has_initial_subflow(msk).

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/428
    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-1-8d6b94150f6b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-10 16:32:42 +02:00
Davide Caratti f5533c7240 mptcp: fix potential wake-up event loss
JIRA: https://issues.redhat.com/browse/RHEL-32669
Upstream Status: net.git commit b111d8fbd2cbc63e05f3adfbbe0d4df655dfcc5b

commit b111d8fbd2cbc63e05f3adfbbe0d4df655dfcc5b
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Feb 23 17:14:16 2024 +0100

    mptcp: fix potential wake-up event loss

    After the blamed commit below, the send buffer auto-tuning can
    happen after that the mptcp_propagate_sndbuf() completes - via
    the delegated action infrastructure.

    We must check for write space even after such change or we risk
    missing the wake-up event.

    Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:36 +02:00
Davide Caratti 574bf2a8a0 mptcp: fix data races on local_id
JIRA: https://issues.redhat.com/browse/RHEL-32669
Upstream Status: net.git commit a7cfe776637004a4c938fde78be4bd608c32c3ef

commit a7cfe776637004a4c938fde78be4bd608c32c3ef
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 15 19:25:31 2024 +0100

    mptcp: fix data races on local_id

    The local address id is accessed lockless by the NL PM, add
    all the required ONCE annotation. There is a caveat: the local
    id can be initialized late in the subflow life-cycle, and its
    validity is controlled by the local_id_valid flag.

    Remove such flag and encode the validity in the local_id field
    itself with negative value before initialization. That allows
    accessing the field consistently with a single read operation.

    Fixes: 0ee4261a36 ("mptcp: implement mptcp_pm_remove_subflow")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:35 +02:00
Davide Caratti 86cc54e974 mptcp: really cope with fastopen race
JIRA: https://issues.redhat.com/browse/RHEL-32669
JIRA: https://issues.redhat.com/browse/RHEL-31604
CVE: CVE-2024-26708
Upstream Status: net.git commit 337cebbd850f94147cee05252778f8f78b8c337f

commit 337cebbd850f94147cee05252778f8f78b8c337f
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:54 2024 +0100

    mptcp: really cope with fastopen race

    Fastopen and PM-trigger subflow shutdown can race, as reported by
    syzkaller.

    In my first attempt to close such race, I missed the fact that
    the subflow status can change again before the subflow_state_change
    callback is invoked.

    Address the issue additionally copying with all the states directly
    reachable from TCP_FIN_WAIT1.

    Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
    Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race")
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:35 +02:00
Davide Caratti d91c423354 mptcp: corner case locking for rx path fields initialization
JIRA: https://issues.redhat.com/browse/RHEL-32669
Upstream Status: net.git commit e4a0fa47e816e186f6b4c0055d07eeec42d11871

commit e4a0fa47e816e186f6b4c0055d07eeec42d11871
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:52 2024 +0100

    mptcp: corner case locking for rx path fields initialization

    Most MPTCP-level related fields are under the mptcp data lock
    protection, but are written one-off without such lock at MPC
    complete time, both for the client and the server

    Leverage the mptcp_propagate_state() infrastructure to move such
    initialization under the proper lock client-wise.

    The server side critical init steps are done by
    mptcp_subflow_fully_established(): ensure the caller properly held the
    relevant lock, and avoid acquiring the same lock in the nested scopes.

    There are no real potential races, as write access to such fields
    is implicitly serialized by the MPTCP state machine; the primary
    goal is consistency.

    Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:35 +02:00
Davide Caratti 85d75b136a mptcp: fix rcv space initialization
JIRA: https://issues.redhat.com/browse/RHEL-32669
Upstream Status: net.git commit 013e3179dbd2bc756ce1dd90354abac62f65b739

commit 013e3179dbd2bc756ce1dd90354abac62f65b739
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:50 2024 +0100

    mptcp: fix rcv space initialization

    mptcp_rcv_space_init() is supposed to happen under the msk socket
    lock, but active msk socket does that without such protection.

    Leverage the existing mptcp_propagate_state() helper to that extent.
    We need to ensure mptcp_rcv_space_init will happen before
    mptcp_rcv_space_adjust(), and the release_cb does not assure that:
    explicitly check for such condition.

    While at it, move the wnd_end initialization out of mptcp_rcv_space_init(),
    it never belonged there.

    Note that the race does not produce ill effect in practice, but
    change allows cleaning-up and defying better the locking model.

    Fixes: a6b118febb ("mptcp: add receive buffer auto-tuning")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:35 +02:00
Davide Caratti 5f289e9896 mptcp: drop the push_pending field
JIRA: https://issues.redhat.com/browse/RHEL-32669
Upstream Status: net.git commit bdd70eb68913c960acb895b00a8c62eb64715b1f

commit bdd70eb68913c960acb895b00a8c62eb64715b1f
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Feb 8 19:03:49 2024 +0100

    mptcp: drop the push_pending field

    Such field is there to avoid acquiring the data lock in a few spots,
    but it adds complexity to the already non trivial locking schema.

    All the relevant call sites (mptcp-level re-injection, set socket
    options), are slow-path, drop such field in favor of 'cb_flags', adding
    the relevant locking.

    This patch could be seen as an improvement, instead of a fix. But it
    simplifies the next patch. The 'Fixes' tag has been added to help having
    this series backported to stable.

    Fixes: e9d09baca676 ("mptcp: avoid atomic bit manipulation when possible")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-04-18 17:25:35 +02:00
Davide Caratti 4884840e1c mptcp: fix inconsistent state on fastopen race
JIRA: https://issues.redhat.com/browse/RHEL-21753
Upstream Status: net.git commit 4fd19a30701659af5839b7bd19d1f05f05933ebe

commit 4fd19a30701659af5839b7bd19d1f05f05933ebe
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Dec 15 17:04:25 2023 +0100

    mptcp: fix inconsistent state on fastopen race

    The netlink PM can race with fastopen self-connect attempts, shutting
    down the first subflow via:

    MPTCP_PM_CMD_DEL_ADDR -> mptcp_nl_remove_id_zero_address ->
      mptcp_pm_nl_rm_subflow_received -> mptcp_close_ssk

    and transitioning such subflow to FIN_WAIT1 status before the syn-ack
    packet is processed. The MPTCP code does not react to such state change,
    leaving the connection in not-fallback status and the subflow handshake
    uncompleted, triggering the following splat:

      WARNING: CPU: 0 PID: 10630 at net/mptcp/subflow.c:1405 subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405
      Modules linked in:
      CPU: 0 PID: 10630 Comm: kworker/u4:11 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
      Workqueue: bat_events batadv_nc_worker
      RIP: 0010:subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405
      Code: 18 89 ee e8 e3 d2 21 f7 40 84 ed 75 1f e8 a9 d7 21 f7 44 89 fe bf 07 00 00 00 e8 0c d3 21 f7 41 83 ff 07 74 07 e8 91 d7 21 f7 <0f> 0b e8 8a d7 21 f7 48 89 df e8 d2 b2 ff ff 31 ff 89 c5 89 c6 e8
      RSP: 0018:ffffc90000007448 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff888031efc700 RCX: ffffffff8a65baf4
      RDX: ffff888043222140 RSI: ffffffff8a65baff RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007
      R10: 000000000000000b R11: 0000000000000000 R12: 1ffff92000000e89
      R13: ffff88807a534d80 R14: ffff888021c11a00 R15: 000000000000000b
      FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa19a0ffc81 CR3: 000000007a2db000 CR4: 00000000003506f0
      DR0: 000000000000d8dd DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_data_ready+0x14c/0x5b0 net/ipv4/tcp_input.c:5128
       tcp_data_queue+0x19c3/0x5190 net/ipv4/tcp_input.c:5208
       tcp_rcv_state_process+0x11ef/0x4e10 net/ipv4/tcp_input.c:6844
       tcp_v4_do_rcv+0x369/0xa10 net/ipv4/tcp_ipv4.c:1929
       tcp_v4_rcv+0x3888/0x3b30 net/ipv4/tcp_ipv4.c:2329
       ip_protocol_deliver_rcu+0x9f/0x480 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x2e4/0x510 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:314 [inline]
       NF_HOOK include/linux/netfilter.h:308 [inline]
       ip_local_deliver+0x1b6/0x550 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:461 [inline]
       ip_rcv_finish+0x1c4/0x2e0 net/ipv4/ip_input.c:449
       NF_HOOK include/linux/netfilter.h:314 [inline]
       NF_HOOK include/linux/netfilter.h:308 [inline]
       ip_rcv+0xce/0x440 net/ipv4/ip_input.c:569
       __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527
       __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641
       process_backlog+0x101/0x6b0 net/core/dev.c:5969
       __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6531
       napi_poll net/core/dev.c:6600 [inline]
       net_rx_action+0x956/0xe90 net/core/dev.c:6733
       __do_softirq+0x21a/0x968 kernel/softirq.c:553
       do_softirq kernel/softirq.c:454 [inline]
       do_softirq+0xaa/0xe0 kernel/softirq.c:441
       </IRQ>
       <TASK>
       __local_bh_enable_ip+0xf8/0x120 kernel/softirq.c:381
       spin_unlock_bh include/linux/spinlock.h:396 [inline]
       batadv_nc_purge_paths+0x1ce/0x3c0 net/batman-adv/network-coding.c:471
       batadv_nc_worker+0x9b1/0x10e0 net/batman-adv/network-coding.c:722
       process_one_work+0x884/0x15c0 kernel/workqueue.c:2630
       process_scheduled_works kernel/workqueue.c:2703 [inline]
       worker_thread+0x8b9/0x1290 kernel/workqueue.c:2784
       kthread+0x33c/0x440 kernel/kthread.c:388
       ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
       </TASK>

    To address the issue, catch the racing subflow state change and
    use it to cause the MPTCP fallback. Such fallback is also used to
    cause the first subflow state propagation to the msk socket via
    mptcp_set_connected(). After this change, the first subflow can
    additionally propagate the TCP_FIN_WAIT1 state, so rename the
    helper accordingly.

    Finally, if the state propagation is delayed to the msk release
    callback, the first subflow can change to a different state in between.
    Cache the relevant target state in a new msk-level field and use
    such value to update the msk state at release time.

    Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
    Cc: stable@vger.kernel.org
    Reported-by: <syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-01-16 14:58:22 +01:00
Paolo Abeni 50b6c14311 mptcp: use mptcp_check_fallback helper
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 83d580ddbe1b3297c346b24070c23fcf6698393c
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Wed Oct 25 16:37:06 2023 -0700

    mptcp: use mptcp_check_fallback helper

    Use __mptcp_check_fallback() helper defined in net/mptcp/protocol.h,
    instead of open-coding it in both __mptcp_do_fallback() and
    mptcp_diag_fill_info().

    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-5-db8f25f798eb@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 0db7537e33 mptcp: drop useless ssk in pm_subflow_check_next
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 74cbb0c65b2963c1f1b51e2426cf0774ed828bc0
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Wed Oct 25 16:37:05 2023 -0700

    mptcp: drop useless ssk in pm_subflow_check_next

    The code using 'ssk' parameter of mptcp_pm_subflow_check_next() has been
    dropped in commit "95d686517884 (mptcp: fix subflow accounting on close)".
    So drop this useless parameter ssk.

    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-4-db8f25f798eb@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 315bfdb0cc mptcp: refactor sndbuf auto-tuning
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 8005184fd1ca6aeb3fea36f4eb9463fc1b90c114
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Oct 23 13:44:42 2023 -0700

    mptcp: refactor sndbuf auto-tuning

    The MPTCP protocol account for the data enqueued on all the subflows
    to the main socket send buffer, while the send buffer auto-tuning
    algorithm set the main socket send buffer size as the max size among
    the subflows.

    That causes bad performances when at least one subflow is sndbuf
    limited, e.g. due to very high latency, as the MPTCP scheduler can't
    even fill such buffer.

    Change the send-buffer auto-tuning algorithm to compute the main socket
    send buffer size as the sum of all the subflows buffer size.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-9-9dc60939d371@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 680f3b10d8 mptcp: give rcvlowat some love
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1
Conflicts: different context for mptcp_stream_ops, as rhel lacks the \
  upstream commit dc97391e6610 ("sock: Remove ->sendpage*() in favour of \
  sendmsg(MSG_SPLICE_PAGES)")

Upstream commit:
commit 5684ab1a0effbfeb706f47d85785f653005b97b1
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Oct 23 13:44:38 2023 -0700

    mptcp: give rcvlowat some love

    The MPTCP protocol allow setting sk_rcvlowat, but the value there
    is currently ignored.

    Additionally, the default subflows sk_rcvlowat basically disables per
    subflow delayed ack: the MPTCP protocol move the incoming data from the
    subflows into the msk socket as soon as the TCP stacks invokes the subflow
    data_ready callback. Later, when __tcp_ack_snd_check() takes action,
    the subflow-level copied_seq matches rcv_nxt, and that mandate for an
    immediate ack.

    Let the mptcp receive path be aware of such threshold, explicitly tracking
    the amount of data available to be ready and checking vs sk_rcvlowat in
    mptcp_poll() and before waking-up readers.

    Additionally implement the set_rcvlowat() callback, to properly handle
    the rcvbuf auto-tuning on sk_rcvlowat changes.

    Finally to properly handle delayed ack, force the subflow level threshold
    to 0 and instead explicitly ask for an immediate ack when the msk level th
    is not reached.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-5-9dc60939d371@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 377775994c mptcp: use plain bool instead of custom binary enum
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit f1f26512a9bf18f7a4c0d59df113a49f39d7d4b6
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Oct 23 13:44:36 2023 -0700

    mptcp: use plain bool instead of custom binary enum

    The 'data_avail' subflow field is already used as plain boolean,
    drop the custom binary enum type and switch to bool.

    No functional changed intended.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-3-9dc60939d371@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 0874ffceee mptcp: add a new sysctl for make after break timeout
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit d866ae9aaa4325f1097e8b7a50f202348ca89b87
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Oct 23 13:44:34 2023 -0700

    mptcp: add a new sysctl for make after break timeout

    The MPTCP protocol allows sockets with no alive subflows to stay
    in ESTABLISHED status for and user-defined timeout, to allow for
    later subflows creation.

    Currently such timeout is constant - TCP_TIMEWAIT_LEN. Let the
    user-space configure them via a newly added sysctl, to better cope
    with busy servers and simplify (make them faster) the relevant
    pktdrill tests.

    Note that the new know does not apply to orphaned MPTCP socket
    waiting for the data_fin handshake completion: they always wait
    TCP_TIMEWAIT_LEN.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-1-9dc60939d371@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 14659f5b9c net: mptcp: use policy generated by YAML spec
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit aab4d8564947f391674391e5c346d7f6f1c49f89
Author: Davide Caratti <dcaratti@redhat.com>
Date:   Mon Oct 23 11:17:11 2023 -0700

    net: mptcp: use policy generated by YAML spec

    generated with:

     $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
     > --spec Documentation/netlink/specs/mptcp.yaml --source \
     > -o net/mptcp/mptcp_pm_gen.c
     $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
     > --spec Documentation/netlink/specs/mptcp.yaml --header \
     > -o net/mptcp/mptcp_pm_gen.h

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/340
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Davide Caratti <dcaratti@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-7-16b1f701f900@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 2fc7f6b9ec net: mptcp: rename netlink handlers to mptcp_pm_nl_<blah>_{doit,dumpit}
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 1e07938e29c587eaae069f6c624daa4c2a56331c
Author: Davide Caratti <dcaratti@redhat.com>
Date:   Mon Oct 23 11:17:10 2023 -0700

    net: mptcp: rename netlink handlers to mptcp_pm_nl_<blah>_{doit,dumpit}

    so that they will match names generated from YAML spec.

    Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340
    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Davide Caratti <dcaratti@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-6-16b1f701f900@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni a6ca5b3c21 mptcp: fix delegated action races
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit a5efdbcece83af94180e8d7c0a6e22947318499d
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed Oct 4 13:38:11 2023 -0700

    mptcp: fix delegated action races

    The delegated action infrastructure is prone to the following
    race: different CPUs can try to schedule different delegated
    actions on the same subflow at the same time.

    Each of them will check different bits via mptcp_subflow_delegate(),
    and will try to schedule the action on the related per-cpu napi
    instance.

    Depending on the timing, both can observe an empty delegated list
    node, causing the same entry to be added simultaneously on two different
    lists.

    The root cause is that the delegated actions infra does not provide
    a single synchronization point. Address the issue reserving an additional
    bit to mark the subflow as scheduled for delegation. Acquiring such bit
    guarantee the caller to own the delegated list node, and being able to
    safely schedule the subflow.

    Clear such bit only when the subflow scheduling is completed, ensuring
    proper barrier in place.

    Additionally swap the meaning of the delegated_action bitmask, to allow
    the usage of the existing helper to set multiple bit at once.

    Fixes: bcd97734318d ("mptcp: use delegate action to schedule 3rd ack retrans")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-1-28de4ac663ae@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni f30f25d39e mptcp: fix dangling connection hang-up
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 27e5ccc2d5a50ed61bb73153edb1066104b108b3
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Sat Sep 16 12:52:49 2023 +0200

    mptcp: fix dangling connection hang-up

    According to RFC 8684 section 3.3:

      A connection is not closed unless [...] or an implementation-specific
      connection-level send timeout.

    Currently the MPTCP protocol does not implement such timeout, and
    connection timing-out at the TCP-level never move to close state.

    Introduces a catch-up condition at subflow close time to move the
    MPTCP socket to close, too.

    That additionally allows removing similar existing inside the worker.

    Finally, allow some additional timeout for plain ESTABLISHED mptcp
    sockets, as the protocol allows creating new subflows even at that
    point and making the connection functional again.

    This issue is actually present since the beginning, but it is basically
    impossible to solve without a long chain of functional pre-requisites
    topped by commit bbd49d114d57 ("mptcp: consolidate transition to
    TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current
    patch, please also backport this other commit as well.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430
    Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni 0dd32cbf12 mptcp: rename timer related helper to less confusing names
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit f6909dc1c1f4452879278128012da6c76bc186a5
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Sat Sep 16 12:52:48 2023 +0200

    mptcp: rename timer related helper to less confusing names

    The msk socket uses to different timeout to track close related
    events and retransmissions. The existing helpers do not indicate
    clearly which timer they actually touch, making the related code
    quite confusing.

    Change the existing helpers name to avoid such confusion. No
    functional change intended.

    This patch is linked to the next one ("mptcp: fix dangling connection
    hang-up"). The two patches are supposed to be backported together.

    Cc: stable@vger.kernel.org # v5.11+
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:28 +01:00
Paolo Abeni cc61893ec8 mptcp: register default scheduler
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit ed1ad86b8527f8f864df3c182adbfcd12a445de6
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:21 2023 -0700

    mptcp: register default scheduler

    This patch defines the default packet scheduler mptcp_sched_default.
    Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init().
    Skip deleting this default scheduler in mptcp_unregister_scheduler().

    Set msk->sched to the default scheduler when the input parameter of
    mptcp_init_sched() is NULL.

    Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans()
    if the defaut scheduler is set or msk->sched is NULL.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 60003c4ba6 mptcp: add scheduler wrappers
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 07336a87fe871518a7b3508e29a21ca1735b3edc
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:18 2023 -0700

    mptcp: add scheduler wrappers

    This patch defines two packet scheduler wrappers mptcp_sched_get_send()
    and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in
    them.

    Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
    mptcp_sched_get_send().

    If msk->sched is NULL, use default functions mptcp_subflow_get_send()
    and mptcp_subflow_get_retrans() to send data.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 09d27aa2b0 mptcp: add scheduled in mptcp_subflow_context
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit fce68b03086fd00eb5a8ba4744f36f0d007d0f9d
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:17 2023 -0700

    mptcp: add scheduled in mptcp_subflow_context

    This patch adds a new member scheduled in struct mptcp_subflow_context,
    which will be set in the MPTCP scheduler context when the scheduler
    picks this subflow to send data.

    Add a new helper mptcp_subflow_set_scheduled() to set this flag using
    WRITE_ONCE().

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni e690e9e26a mptcp: add sched in mptcp_sock
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 1730b2b2c5a5a886007b247366aebe0976dc8881
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:16 2023 -0700

    mptcp: add sched in mptcp_sock

    This patch adds a new struct member sched in struct mptcp_sock.
    And two helpers mptcp_init_sched() and mptcp_release_sched() to
    init and release it.

    Init it with the sysctl scheduler in mptcp_init_sock(), copy the
    scheduler from the parent in mptcp_sk_clone(), and release it in
    __mptcp_destroy_sock().

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 7060756d39 mptcp: add a new sysctl scheduler
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit e3b2870b6d220d1cbd2d52d7acc9f0de9fdfeccf
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:15 2023 -0700

    mptcp: add a new sysctl scheduler

    This patch adds a new sysctl, named scheduler, to support for selection
    of different schedulers. Export mptcp_get_scheduler helper to get this
    sysctl.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 32774b335d mptcp: add struct mptcp_sched_ops
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 740ebe35bd3f5c4ff8ec60e5e521e47ea8f5492c
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:14 2023 -0700

    mptcp: add struct mptcp_sched_ops

    This patch defines struct mptcp_sched_ops, which has three struct members,
    name, owner and list, and four function pointers: init(), release() and
    get_subflow().

    The scheduler function get_subflow() have a struct mptcp_sched_data
    parameter, which contains a reinject flag for retrans or not, a subflows
    number and a mptcp_subflow_context array.

    Add the scheduler registering, unregistering and finding functions to add,
    delete and find a packet scheduler on the global list mptcp_sched_list.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 8dd804842e mptcp: drop last_snd and MPTCP_RESET_SCHEDULER
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit ebc1e08f01ebedbf962e6417bbf6952bd4ca2142
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Aug 21 15:25:13 2023 -0700

    mptcp: drop last_snd and MPTCP_RESET_SCHEDULER

    Since the burst check conditions have moved out of the function
    mptcp_subflow_get_send(), it makes all msk->last_snd useless.
    This patch drops them as well as the macro MPTCP_RESET_SCHEDULER.

    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 42f2df3a6d mptcp: get rid of msk->subflow
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 39880bd808ad2ddfb9b7fee129568c3b814f0609
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Aug 11 17:57:26 2023 +0200

    mptcp: get rid of msk->subflow

    Such field is now unused just as a flag to control the first subflow
    deletion at close() time. Introduce a new bit flag for that and finally
    drop the mentioned field.

    As an intended side effect, now the first subflow sock is not freed
    before close() even for passive sockets. The msk has no open/active
    subflows if the first one is closed and the subflow list is singular,
    update accordingly the state check in mptcp_stream_accept().

    Among other benefits, the subflow removal, reduces the amount of memory
    used on the client side for each mptcp connection, allows passive sockets
    to go through successful accept()/disconnect()/connect() and makes return
    error code consistent for failing both passive and active sockets.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/290
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 7179368744 mptcp: change the mpc check helper to return a sk
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 3f326a821b99812edb6d3c24bcb78377cae6e432
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Aug 11 17:57:25 2023 +0200

    mptcp: change the mpc check helper to return a sk

    After the previous patch the __mptcp_nmpc_socket helper is used
    only to ensure that the MPTCP socket is a suitable status - that
    is, the mptcp capable handshake is not started yet.

    Change the return value to the relevant subflow sock, to finally
    remove the last references to first subflow socket in the MPTCP stack.

    As a bonus, we can get rid of a few local variables in different
    functions.

    No functional change intended.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni d3e6c27ef7 mptcp: fix disconnect vs accept race
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 511b90e39250135a7f900f1c3afbce25543018a2
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Aug 3 18:27:30 2023 +0200

    mptcp: fix disconnect vs accept race

    Despite commit 0ad529d9fd2b ("mptcp: fix possible divide by zero in
    recvmsg()"), the mptcp protocol is still prone to a race between
    disconnect() (or shutdown) and accept.

    The root cause is that the mentioned commit checks the msk-level
    flag, but mptcp_stream_accept() does acquire the msk-level lock,
    as it can rely directly on the first subflow lock.

    As reported by Christoph than can lead to a race where an msk
    socket is accepted after that mptcp_subflow_queue_clean() releases
    the listener socket lock and just before it takes destructive
    actions leading to the following splat:

    BUG: kernel NULL pointer dereference, address: 0000000000000012
    PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330
    Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89
    RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300
    RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a
    RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020
    R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880
    FS:  00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     do_accept+0x1ae/0x260 net/socket.c:1872
     __sys_accept4+0x9b/0x110 net/socket.c:1913
     __do_sys_accept4 net/socket.c:1954 [inline]
     __se_sys_accept4 net/socket.c:1951 [inline]
     __x64_sys_accept4+0x20/0x30 net/socket.c:1951
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x6e/0xd8

    Address the issue by temporary removing the pending request socket
    from the accept queue, so that racing accept() can't touch them.

    After depleting the msk - the ssk still exists, as plain TCP sockets,
    re-insert them into the accept queue, so that later inet_csk_listen_stop()
    will complete the tcp socket disposal.

    Fixes: 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close")
    Cc: stable@vger.kernel.org
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 2ff38709c3 mptcp: fix rcv buffer auto-tuning
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit b8dc6d6ce93142ccd4c976003bb6c25d63aac2ce
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jul 20 20:47:50 2023 +0200

    mptcp: fix rcv buffer auto-tuning

    The MPTCP code uses the assumption that the tcp_win_from_space() helper
    does not use any TCP-specific field, and thus works correctly operating
    on an MPTCP socket.

    The commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
    broke such assumption, and as a consequence most MPTCP connections stall
    on zero-window event due to auto-tuning changing the rcv buffer size
    quite randomly.

    Address the issue syncing again the MPTCP auto-tuning code with the TCP
    one. To achieve that, factor out the windows size logic in socket
    independent helpers, and reuse them in mptcp_rcv_space_adjust(). The
    MPTCP level scaling_ratio is selected as the minimum one from the all
    the subflows, as a worst-case estimate.

    Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Link: https://lore.kernel.org/r/20230720-upstream-net-next-20230720-mptcp-fix-rcv-buffer-auto-tuning-v1-1-175ef12b8380@tessares.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni b3feb89fbf mptcp: pass addr to mptcp_pm_alloc_anno_list
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 528cb5f2a1e859522f36f091f29f5c81ec6d4a4c
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Jun 20 18:30:22 2023 +0200

    mptcp: pass addr to mptcp_pm_alloc_anno_list

    Pass addr parameter to mptcp_pm_alloc_anno_list() instead of entry. We
    can reduce the scope, e.g. in mptcp_pm_alloc_anno_list(), we only access
    "entry->addr", we can then restrict to the pointer to "addr" then.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 676a0f3e24 mptcp: add subflow unique id
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 6f06b4d4d1cc676a3f9d947f931ec3866b6c4f6c
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Jun 20 18:30:17 2023 +0200

    mptcp: add subflow unique id

    The user-space need to properly account the data received/sent by
    individual subflows. When additional subflows are created and/or
    closed during the MPTCP socket lifetime, the information currently
    exposed via MPTCP_TCPINFO are not enough: subflows are identified only
    by the sequential position inside the info dumps, and that will change
    with the above mentioned events.

    To solve the above problem, this patch introduces a new subflow
    identifier that is unique inside the given MPTCP socket scope.

    The initial subflow get the id 1 and the other subflows get incremental
    values at join time.

    Link: https://github.com/multipath-tcp/mptcp_net-next/issues/388
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 17ded46adb mptcp: track some aggregate data counters
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 38967f424b5be79c4c676712e5640d846efd07e3
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Jun 20 18:30:15 2023 +0200

    mptcp: track some aggregate data counters

    Currently there are no data transfer counters accounting for all
    the subflows used by a given MPTCP socket. The user-space can compute
    such figures aggregating the subflow info, but that is inaccurate
    if any subflow is closed before the MPTCP socket itself.

    Add the new counters in the MPTCP socket itself and expose them
    via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
    acquire the relevant locks before fetching the msk data, to ensure
    better data consistency

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/385
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00