Commit Graph

328 Commits

Author SHA1 Message Date
Paolo Abeni 2e85e0460f mptcp: drop legacy code around RX EOF
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit b7535cfed223a9f02f9530853616f197b386d775
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Jun 20 18:24:22 2023 +0200

    mptcp: drop legacy code around RX EOF

    Thanks to the previous patch -- "mptcp: consolidate fallback and non
    fallback state machine" -- we can finally drop the "temporary hack"
    used to detect rx eof.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni e48947e112 mptcp: unify pm set_flags interfaces
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 6ba7ce89905c5d5cdb4ff9ff7c763a6a1d31f48d
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Thu Jun 8 15:20:52 2023 +0200

    mptcp: unify pm set_flags interfaces

    This patch unifies the three PM set_flags() interfaces:

    mptcp_pm_nl_set_flags() in mptcp/pm_netlink.c for the in-kernel PM and
    mptcp_userspace_pm_set_flags() in mptcp/pm_userspace.c for the
    userspace PM.

    They'll be switched in the common PM infterface mptcp_pm_set_flags() in
    mptcp/pm.c based on whether token is NULL or not.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 6b11f7289e mptcp: unify pm get_flags_and_ifindex_by_id
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit f40be0db0b7680c2e9f0b1289788542813ba0f00
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Thu Jun 8 15:20:51 2023 +0200

    mptcp: unify pm get_flags_and_ifindex_by_id

    This patch unifies the three PM get_flags_and_ifindex_by_id() interfaces:

    mptcp_pm_nl_get_flags_and_ifindex_by_id() in mptcp/pm_netlink.c for the
    in-kernel PM and mptcp_userspace_pm_get_flags_and_ifindex_by_id() in
    mptcp/pm_userspace.c for the userspace PM.

    They'll be switched in the common PM infterface
    mptcp_pm_get_flags_and_ifindex_by_id() in mptcp/pm.c based on whether
    mptcp_pm_is_userspace() or not.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 0c1019c093 mptcp: unify pm get_local_id interfaces
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 9bbec87ecfe8a5c06710100a93e6b7e66f2cbbaf
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Thu Jun 8 15:20:50 2023 +0200

    mptcp: unify pm get_local_id interfaces

    This patch unifies the three PM get_local_id() interfaces:

    mptcp_pm_nl_get_local_id() in mptcp/pm_netlink.c for the in-kernel PM and
    mptcp_userspace_pm_get_local_id() in mptcp/pm_userspace.c for the
    userspace PM.

    They'll be switched in the common PM infterface mptcp_pm_get_local_id()
    in mptcp/pm.c based on whether mptcp_pm_is_userspace() or not.

    Also put together the declarations of these three functions in protocol.h.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni a8b8109ac5 mptcp: export local_address
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit dc886bce753cc2cf3c88ec5c7a6880a4e17d65ba
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Thu Jun 8 15:20:49 2023 +0200

    mptcp: export local_address

    Rename local_address() with "mptcp_" prefix and export it in protocol.h.

    This function will be re-used in the common PM code (pm.c) in the
    following commit.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 9bc8761d86 mptcp: only send RM_ADDR in nl_cmd_remove
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 8b1c94da1e481090f24127b2c420b0c0b0421ce3
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Sun Jun 4 20:25:17 2023 -0700

    mptcp: only send RM_ADDR in nl_cmd_remove

    The specifications from [1] about the "REMOVE" command say:

        Announce that an address has been lost to the peer

    It was then only supposed to send a RM_ADDR and not trying to delete
    associated subflows.

    A new helper mptcp_pm_remove_addrs() is then introduced to do just
    that, compared to mptcp_pm_remove_addrs_and_subflows() also removing
    subflows.

    To delete a subflow, the userspace daemon can use the "SUB_DESTROY"
    command, see mptcp_nl_cmd_sf_destroy().

    Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
    Link: https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.96/include/uapi/linux/mptcp.h [1]
    Cc: stable@vger.kernel.org
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 69436586d5 mptcp: consolidate passive msk socket initialization
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 7e8b88ec35eef363040e08d99536d2bebef83774
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed May 31 12:37:05 2023 -0700

    mptcp: consolidate passive msk socket initialization

    When the msk socket is cloned at MPC handshake time, a few
    fields are initialized in a racy way outside mptcp_sk_clone()
    and the msk socket lock.

    The above is due historical reasons: before commit a88d0092b24b
    ("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
    carrying all the needed date was not available yet at msk creation
    time

    We can now refactor the code moving the missing initialization bit
    under the socket lock, removing the init race and avoiding some
    code duplication.

    This will also simplify the next patch, as all msk->first write
    access are now under the msk socket lock.

    Fixes: 0397c6d85f ("mptcp: keep unaccepted MPC subflow into join list")
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 00e09e5a0f mptcp: add annotations around msk->subflow accesses
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 5b825727d0871b23e8867f6371183e61628b4a26
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed May 31 12:37:04 2023 -0700

    mptcp: add annotations around msk->subflow accesses

    The MPTCP can access the first subflow socket in a few spots
    outside the socket lock scope. That is actually safe, as MPTCP
    will delete the socket itself only after the msk sock close().

    Still the such accesses causes a few KCSAN splats, as reported
    by Christoph. Silence the harmless warning adding a few annotation
    around the relevant accesses.

    Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 0d048350f2 mptcp: fix connect timeout handling
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 786fc12457268cc9b555dde6c22ae7300d4b40e1
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed May 31 12:37:03 2023 -0700

    mptcp: fix connect timeout handling

    Ondrej reported a functional issue WRT timeout handling on connect
    with a nice reproducer.

    The problem is that the current mptcp connect waits for both the
    MPTCP socket level timeout, and the first subflow socket timeout.
    The latter is not influenced/touched by the exposed setsockopt().

    Overall the above makes the SO_SNDTIMEO a no-op on connect.

    Since mptcp_connect is invoked via inet_stream_connect and the
    latter properly handle the MPTCP level timeout, we can address the
    issue making the nested subflow level connect always unblocking.

    This also allow simplifying a bit the code, dropping an ugly hack
    to handle the fastopen and custom proto_ops connect.

    The issues predates the blamed commit below, but the current resolution
    requires the infrastructure introduced there.

    Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
    Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Paolo Abeni 25e3a8b044 mptcp: preserve const qualifier in mptcp_sk()
JIRA: https://issues.redhat.com/browse/RHEL-15036
Tested: LNST, Tier1

Upstream commit:
commit 403a40f2304d4730a780ab9d6a2b93d1e4ac39d2
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Mar 17 15:55:38 2023 +0000

    mptcp: preserve const qualifier in mptcp_sk()

    We can change mptcp_sk() to propagate its argument const qualifier,
    thanks to container_of_const().

    We need to change few things to avoid build errors:

    mptcp_set_datafin_timeout() and mptcp_rtx_head() have to accept
    non-const sk pointers.

    @msk local variable in mptcp_pending_tail() must be const.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31 21:50:01 +01:00
Davide Caratti d3fddd50ec mptcp: fix accept vs worker race
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 63740448a32e

commit 63740448a32eb662e05894425b47bcc5814136f4
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Apr 17 16:00:41 2023 +0200

    mptcp: fix accept vs worker race

    The mptcp worker and mptcp_accept() can race, as reported by Christoph:

    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 1 PID: 14351 at lib/refcount.c:25 refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
    Modules linked in:
    CPU: 1 PID: 14351 Comm: syz-executor.2 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    RIP: 0010:refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
    Code: 02 31 ff 89 de e8 1b f0 a7 ff 84 db 0f 85 6e ff ff ff e8 3e f5 a7 ff 48 c7 c7 d8 c7 34 83 c6 05 6d 2d 0f 02 01 e8 cb 3d 90 ff <0f> 0b e9 4f ff ff ff e8 1f f5 a7 ff 0f b6 1d 54 2d 0f 02 31 ff 89
    RSP: 0018:ffffc90000a47bf8 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: ffff88802eae98c0 RSI: ffffffff81097d4f RDI: 0000000000000001
    RBP: ffff88802e712180 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000001 R11: ffff88802eaea148 R12: ffff88802e712100
    R13: ffff88802e712a88 R14: ffff888005cb93a8 R15: ffff88802e712a88
    FS:  0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f277fd89120 CR3: 0000000035486002 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     __refcount_add include/linux/refcount.h:199 [inline]
     __refcount_inc include/linux/refcount.h:250 [inline]
     refcount_inc include/linux/refcount.h:267 [inline]
     sock_hold include/net/sock.h:775 [inline]
     __mptcp_close+0x4c6/0x4d0 net/mptcp/protocol.c:3051
     mptcp_close+0x24/0xe0 net/mptcp/protocol.c:3072
     inet_release+0x56/0xa0 net/ipv4/af_inet.c:429
     __sock_release+0x51/0xf0 net/socket.c:653
     sock_close+0x18/0x20 net/socket.c:1395
     __fput+0x113/0x430 fs/file_table.c:321
     task_work_run+0x96/0x100 kernel/task_work.c:179
     exit_task_work include/linux/task_work.h:38 [inline]
     do_exit+0x4fc/0x10c0 kernel/exit.c:869
     do_group_exit+0x51/0xf0 kernel/exit.c:1019
     get_signal+0x12b0/0x1390 kernel/signal.c:2859
     arch_do_signal_or_restart+0x25/0x260 arch/x86/kernel/signal.c:306
     exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
     exit_to_user_mode_prepare+0x131/0x1a0 kernel/entry/common.c:203
     __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
     syscall_exit_to_user_mode+0x19/0x40 kernel/entry/common.c:296
     do_syscall_64+0x46/0x90 arch/x86/entry/common.c:86
     entry_SYSCALL_64_after_hwframe+0x72/0xdc
    RIP: 0033:0x7fec4b4926a9
    Code: Unable to access opcode bytes at 0x7fec4b49267f.
    RSP: 002b:00007fec49f9dd78 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
    RAX: fffffffffffffe00 RBX: 00000000006bc058 RCX: 00007fec4b4926a9
    RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006bc058
    RBP: 00000000006bc050 R08: 00000000007df998 R09: 00000000007df998
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bc05c
    R13: fffffffffffffea8 R14: 000000000000000b R15: 000000000001fe40
     </TASK>

    The root cause is that the worker can force fallback to TCP the first
    mptcp subflow, actually deleting the unaccepted msk socket.

    We can explicitly prevent the race delaying the unaccepted msk deletion
    at listener shutdown time. In case the closed subflow is later accepted,
    just drop the mptcp context and let the user-space deal with the
    paired mptcp socket.

    Fixes: b6985b9b8295 ("mptcp: use the workqueue to destroy unaccepted sockets")
    Cc: stable@vger.kernel.org
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Link: https://github.com/multipath-tcp/mptcp_net-next/issues/375
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Tested-by: Christoph Paasch <cpaasch@apple.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:50 +02:00
Davide Caratti 992b0ca53f mptcp: stops worker on unaccepted sockets at listener close
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 2a6a870e44dd

commit 2a6a870e44dd88f1a6a2893c65ef756a9edfb4c7
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Apr 17 16:00:40 2023 +0200

    mptcp: stops worker on unaccepted sockets at listener close

    This is a partial revert of the blamed commit, with a relevant
    change: mptcp_subflow_queue_clean() now just change the msk
    socket status and stop the worker, so that the UaF issue addressed
    by the blamed commit is not re-introduced.

    The above prevents the mptcp worker from running concurrently with
    inet_csk_listen_stop(), as such race would trigger a warning, as
    reported by Christoph:

    RSP: 002b:00007f784fe09cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    WARNING: CPU: 0 PID: 25807 at net/ipv4/inet_connection_sock.c:1387 inet_csk_listen_stop+0x664/0x870 net/ipv4/inet_connection_sock.c:1387
    RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f7850afd6a9
    RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000004
    Modules linked in:
    RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bc05c
    R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40

     </TASK>
    CPU: 0 PID: 25807 Comm: syz-executor.7 Not tainted 6.2.0-g778e54711659 #7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    RIP: 0010:inet_csk_listen_stop+0x664/0x870 net/ipv4/inet_connection_sock.c:1387
    RAX: 0000000000000000 RBX: ffff888100dfbd40 RCX: 0000000000000000
    RDX: ffff8881363aab80 RSI: ffffffff81c494f4 RDI: 0000000000000005
    RBP: ffff888126dad080 R08: 0000000000000005 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff888100dfe040
    R13: 0000000000000001 R14: 0000000000000000 R15: ffff888100dfbdd8
    FS:  00007f7850a2c800(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b32d26000 CR3: 000000012fdd8006 CR4: 0000000000770ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     __tcp_close+0x5b2/0x620 net/ipv4/tcp.c:2875
     __mptcp_close_ssk+0x145/0x3d0 net/mptcp/protocol.c:2427
     mptcp_destroy_common+0x8a/0x1c0 net/mptcp/protocol.c:3277
     mptcp_destroy+0x41/0x60 net/mptcp/protocol.c:3304
     __mptcp_destroy_sock+0x56/0x140 net/mptcp/protocol.c:2965
     __mptcp_close+0x38f/0x4a0 net/mptcp/protocol.c:3057
     mptcp_close+0x24/0xe0 net/mptcp/protocol.c:3072
     inet_release+0x53/0xa0 net/ipv4/af_inet.c:429
     __sock_release+0x4e/0xf0 net/socket.c:651
     sock_close+0x15/0x20 net/socket.c:1393
     __fput+0xff/0x420 fs/file_table.c:321
     task_work_run+0x8b/0xe0 kernel/task_work.c:179
     resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
     exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
     exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203
     __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
     syscall_exit_to_user_mode+0x1d/0x40 kernel/entry/common.c:296
     do_syscall_64+0x46/0x90 arch/x86/entry/common.c:86
     entry_SYSCALL_64_after_hwframe+0x72/0xdc
    RIP: 0033:0x7f7850af70dc
    RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7850af70dc
    RDX: 00007f7850a2c800 RSI: 0000000000000002 RDI: 0000000000000003
    RBP: 00000000006bd980 R08: 0000000000000000 R09: 00000000000018a0
    R10: 00000000316338a4 R11: 0000000000000293 R12: 0000000000211e31
    R13: 00000000006bc05c R14: 00007f785062c000 R15: 0000000000211af0

    Fixes: 0a3f4f1f9c27 ("mptcp: fix UaF in listener shutdown")
    Cc: stable@vger.kernel.org
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Link: https://github.com/multipath-tcp/mptcp_net-next/issues/371
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:50 +02:00
Davide Caratti 27656b7580 mptcp: make userspace_pm_append_new_local_addr static
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit aa5887dca2d2

commit aa5887dca2d236fc50000e27023d4d78dce3af30
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Fri Apr 14 17:47:06 2023 +0200

    mptcp: make userspace_pm_append_new_local_addr static

    mptcp_userspace_pm_append_new_local_addr() has always exclusively been
    used in pm_userspace.c since its introduction in
    commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs").

    So make it static.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:50 +02:00
Davide Caratti d2b25a0b65 mptcp: move first subflow allocation at mpc access time
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit ddb1a072f858

commit ddb1a072f858704b3555876877ca38c5b103a215
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 14 16:08:03 2023 +0200

    mptcp: move first subflow allocation at mpc access time

    In the long run this will simplify the mptcp code and will
    allow for more consistent behavior. Move the first subflow
    allocation out of the sock->init ops into the __mptcp_nmpc_socket()
    helper.

    Since the first subflow creation can now happen after the first
    setsockopt() we additionally need to invoke mptcp_sockopt_sync()
    on it.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:50 +02:00
Davide Caratti f30b232dad mptcp: drop unneeded argument
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 7a486c443c89

commit 7a486c443c89bd949f7a64e0040f704e02710b3c
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 14 16:08:00 2023 +0200

    mptcp: drop unneeded argument

    After commit 3a236aef280e ("mptcp: refactor passive socket initialization"),
    every mptcp_pm_fully_established() call is always invoked with a
    GFP_ATOMIC argument. We can then drop it.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:49 +02:00
Davide Caratti 324cdd1ddb mptcp: fix UaF in listener shutdown
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 0a3f4f1f9c27

commit 0a3f4f1f9c27215e4ddcd312558342e57b93e518
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Mar 9 15:50:00 2023 +0100

    mptcp: fix UaF in listener shutdown

    As reported by Christoph after having refactored the passive
    socket initialization, the mptcp listener shutdown path is prone
    to an UaF issue.

      BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x73/0xe0
      Write of size 4 at addr ffff88810cb23098 by task syz-executor731/1266

      CPU: 1 PID: 1266 Comm: syz-executor731 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x6e/0x91
       print_report+0x16a/0x46f
       kasan_report+0xad/0x130
       kasan_check_range+0x14a/0x1a0
       _raw_spin_lock_bh+0x73/0xe0
       subflow_error_report+0x6d/0x110
       sk_error_report+0x3b/0x190
       tcp_disconnect+0x138c/0x1aa0
       inet_child_forget+0x6f/0x2e0
       inet_csk_listen_stop+0x209/0x1060
       __mptcp_close_ssk+0x52d/0x610
       mptcp_destroy_common+0x165/0x640
       mptcp_destroy+0x13/0x80
       __mptcp_destroy_sock+0xe7/0x270
       __mptcp_close+0x70e/0x9b0
       mptcp_close+0x2b/0x150
       inet_release+0xe9/0x1f0
       __sock_release+0xd2/0x280
       sock_close+0x15/0x20
       __fput+0x252/0xa20
       task_work_run+0x169/0x250
       exit_to_user_mode_prepare+0x113/0x120
       syscall_exit_to_user_mode+0x1d/0x40
       do_syscall_64+0x48/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

    The msk grace period can legitly expire in between the last
    reference count dropped in mptcp_subflow_queue_clean() and
    the later eventual access in inet_csk_listen_stop()

    After the previous patch we don't need anymore special-casing
    msk listener socket cleanup: the mptcp worker will process each
    of the unaccepted msk sockets.

    Just drop the now unnecessary code.

    Please note this commit depends on the two parent ones:

      mptcp: refactor passive socket initialization
      mptcp: use the workqueue to destroy unaccepted sockets

    Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets")
    Cc: stable@vger.kernel.org
    Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/346
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:48 +02:00
Davide Caratti 3b214cc5ea mptcp: use the workqueue to destroy unaccepted sockets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit b6985b9b8295

commit b6985b9b82954caa53f862d6059d06c0526254f0
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Mar 9 15:49:59 2023 +0100

    mptcp: use the workqueue to destroy unaccepted sockets

    Christoph reported a UaF at token lookup time after having
    refactored the passive socket initialization part:

      BUG: KASAN: use-after-free in __token_bucket_busy+0x253/0x260
      Read of size 4 at addr ffff88810698d5b0 by task syz-executor653/3198

      CPU: 1 PID: 3198 Comm: syz-executor653 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x6e/0x91
       print_report+0x16a/0x46f
       kasan_report+0xad/0x130
       __token_bucket_busy+0x253/0x260
       mptcp_token_new_connect+0x13d/0x490
       mptcp_connect+0x4ed/0x860
       __inet_stream_connect+0x80e/0xd90
       tcp_sendmsg_fastopen+0x3ce/0x710
       mptcp_sendmsg+0xff1/0x1a20
       inet_sendmsg+0x11d/0x140
       __sys_sendto+0x405/0x490
       __x64_sys_sendto+0xdc/0x1b0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc

    We need to properly clean-up all the paired MPTCP-level
    resources and be sure to release the msk last, even when
    the unaccepted subflow is destroyed by the TCP internals
    via inet_child_forget().

    We can re-use the existing MPTCP_WORK_CLOSE_SUBFLOW infra,
    explicitly checking that for the critical scenario: the
    closed subflow is the MPC one, the msk is not accepted and
    eventually going through full cleanup.

    With such change, __mptcp_destroy_sock() is always called
    on msk sockets, even on accepted ones. We don't need anymore
    to transiently drop one sk reference at msk clone time.

    Please note this commit depends on the parent one:

      mptcp: refactor passive socket initialization

    Fixes: 58b0991962 ("mptcp: create msk early")
    Cc: stable@vger.kernel.org
    Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/347
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:48 +02:00
Davide Caratti 4081e71c20 mptcp: netlink: respect v4/v6-only sockets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit fb00ee4f3343

commit fb00ee4f3343acb2b9222ca9b73b47dd1e1a8efc
Author: Matthieu Baerts <matthieu.baerts@tessares.net>
Date:   Thu Jan 12 18:42:52 2023 +0100

    mptcp: netlink: respect v4/v6-only sockets

    If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY
    option has been set, the userspace PM would allow creating subflows
    using IPv4 addresses, e.g. mapped in v6.

    The kernel side of userspace PM will also accept creating subflows with
    local and remote addresses having different families. Depending on the
    subflow socket's family, different behaviours are expected:
     - If AF_INET is forced with a v6 address, the kernel will take the last
       byte of the IP and try to connect to that: a new subflow is created
       but to a non expected address.
     - If AF_INET6 is forced with a v4 address, the kernel will try to
       connect to a v4 address (v4-mapped-v6). A -EBADF error from the
       connect() part is then expected.

    It is then required to check the given families can be accepted. This is
    done by using a new helper for addresses family matching, taking care of
    IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by
    the in-kernel path-manager to use mixed IPv4 and IPv6 addresses.

    While at it, a clear error message is now reported if there are some
    conflicts with the families that have been passed by the userspace.

    Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:46 +02:00
Davide Caratti 29d9617b30 mptcp: explicitly specify sock family at subflow creation time
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 6bc1fe7dd748

commit 6bc1fe7dd748ba5e76e7917d110837cafe7b931c
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 12 18:42:51 2023 +0100

    mptcp: explicitly specify sock family at subflow creation time

    Let the caller specify the to-be-created subflow family.

    For a given MPTCP socket created with the AF_INET6 family, the current
    userspace PM can already ask the kernel to create subflows in v4 and v6.
    If "plain" IPv4 addresses are passed to the kernel, they are
    automatically mapped in v6 addresses "by accident". This can be
    problematic because the userspace will need to pass different addresses,
    now the v4-mapped-v6 addresses to destroy this new subflow.

    On the other hand, if the MPTCP socket has been created with the AF_INET
    family, the command to create a subflow in v6 will be accepted but the
    result will not be the one as expected as new subflow will be created in
    IPv4 using part of the v6 addresses passed to the kernel: not creating
    the expected subflow then.

    No functional change intended for the in-kernel PM where an explicit
    enforcement is currently in place. This arbitrary enforcement will be
    leveraged by other patches in a future version.

    Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
    Cc: stable@vger.kernel.org
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:46 +02:00
Davide Caratti 1cf67e4ee5 mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 294de9090938

commit 294de9090938a7959e2757573509abd9ea7bd254
Author: Menglong Dong <imagedong@tencent.com>
Date:   Fri Jan 6 10:57:22 2023 -0800

    mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect()

    'ssk' should be more appropriate to be the name of the first argument
    in mptcp_token_new_connect().

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:46 +02:00
Davide Caratti f39373fda5 mptcp: add pm listener events
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit f8c9dfbd875b

commit f8c9dfbd875b17fee59c7f1aa35a4944d4e6d810
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Wed Nov 30 15:06:28 2022 +0100

    mptcp: add pm listener events

    This patch adds two new MPTCP netlink event types for PM listening
    socket create and close, named MPTCP_EVENT_LISTENER_CREATED and
    MPTCP_EVENT_LISTENER_CLOSED.

    Add a new function mptcp_event_pm_listener() to push the new events
    with family, port and addr to userspace.

    Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in
    mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with
    MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk().

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:45 +02:00
Davide Caratti 475d2f3fa2 mptcp: add subflow_v(4,6)_send_synack()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit 36b122baf6a8

commit 36b122baf6a8bd46b4a591f12f4ed17b22257408
Author: Dmytro Shytyi <dmytro@shytyi.net>
Date:   Fri Nov 25 23:29:51 2022 +0100

    mptcp: add subflow_v(4,6)_send_synack()

    The send_synack() needs to be overridden for MPTCP to support TFO for
    two reasons:

    - There is not be enough space in the TCP options if the TFO cookie has
      to be added in the SYN+ACK with other options: MSS (4), SACK OK (2),
      Timestamps (10), Window Scale (3+1), TFO (10+2), MP_CAPABLE (12).
      MPTCPv1 specs -- RFC 8684, section B.1 [1] -- suggest to drop the TCP
      timestamps option in this case.

    - The data received in the SYN has to be handled: the SKB can be
      dequeued from the subflow sk and transferred to the MPTCP sk. Counters
      need to be updated accordingly and the application can be notified at
      the end because some bytes have been received.

    [1] https://www.rfc-editor.org/rfc/rfc8684.html#section-b.1

    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:44 +02:00
Davide Caratti f1fc111f49 mptcp: implement delayed seq generation for passive fastopen
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit dfc8d0603033

commit dfc8d06030335a816d81aa92fe5d1f84d06998ad
Author: Dmytro Shytyi <dmytro@shytyi.net>
Date:   Fri Nov 25 23:29:50 2022 +0100

    mptcp: implement delayed seq generation for passive fastopen

    With fastopen in place, the first subflow socket is created before the
    MPC handshake completes, and we need to properly initialize the sequence
    numbers at MPC ACK reception.

    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:44 +02:00
Davide Caratti 32a2c37cac mptcp: consolidate initial ack seq generation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2193330
Upstream Status: net.git commit b3ea6b272d79

commit b3ea6b272d79a43baaaa9af871ee66f6fda4688f
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Nov 25 23:29:49 2022 +0100

    mptcp: consolidate initial ack seq generation

    Currently the initial ack sequence is generated on demand whenever
    it's requested and the remote key is handy. The relevant code is
    scattered in different places and can lead to multiple, unneeded,
    crypto operations.

    This change consolidates the ack sequence generation code in a single
    helper, storing the sequence number at the subflow level.

    The above additionally saves a few conditional in fast-path and will
    simplify the upcoming fast-open implementation.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-05-11 11:46:44 +02:00
Davide Caratti 8c08128611 mptcp: fix lockdep false positive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161699
Upstream Status: net.git commit fec3adfd754c

commit fec3adfd754ccc99a7230e8ab9f105b65fb07bcc
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Dec 20 11:52:15 2022 -0800

    mptcp: fix lockdep false positive

    MattB reported a lockdep splat in the mptcp listener code cleanup:

     WARNING: possible circular locking dependency detected
     packetdrill/14278 is trying to acquire lock:
     ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)

     but task is already holding lock:
     ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)

     which lock already depends on the new lock.

     the existing dependency chain (in reverse order) is:

     -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
            __lock_acquire (kernel/locking/lockdep.c:5055)
            lock_acquire (kernel/locking/lockdep.c:466)
            lock_sock_nested (net/core/sock.c:3463)
            mptcp_worker (net/mptcp/protocol.c:2614)
            process_one_work (kernel/workqueue.c:2294)
            worker_thread (include/linux/list.h:292)
            kthread (kernel/kthread.c:376)
            ret_from_fork (arch/x86/entry/entry_64.S:312)

     -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
            check_prev_add (kernel/locking/lockdep.c:3098)
            validate_chain (kernel/locking/lockdep.c:3217)
            __lock_acquire (kernel/locking/lockdep.c:5055)
            lock_acquire (kernel/locking/lockdep.c:466)
            __flush_work (kernel/workqueue.c:3070)
            __cancel_work_timer (kernel/workqueue.c:3160)
            mptcp_cancel_work (net/mptcp/protocol.c:2758)
            mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
            __mptcp_close_ssk (net/mptcp/protocol.c:2363)
            mptcp_destroy_common (net/mptcp/protocol.c:3170)
            mptcp_destroy (include/net/sock.h:1495)
            __mptcp_destroy_sock (net/mptcp/protocol.c:2886)
            __mptcp_close (net/mptcp/protocol.c:2959)
            mptcp_close (net/mptcp/protocol.c:2974)
            inet_release (net/ipv4/af_inet.c:432)
            __sock_release (net/socket.c:651)
            sock_close (net/socket.c:1367)
            __fput (fs/file_table.c:320)
            task_work_run (kernel/task_work.c:181 (discriminator 1))
            exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
            syscall_exit_to_user_mode (kernel/entry/common.c:130)
            do_syscall_64 (arch/x86/entry/common.c:87)
            entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

     other info that might help us debug this:

      Possible unsafe locking scenario:

            CPU0                    CPU1
            ----                    ----
       lock(sk_lock-AF_INET);
                                    lock((work_completion)(&msk->work));
                                    lock(sk_lock-AF_INET);
       lock((work_completion)(&msk->work));

      *** DEADLOCK ***

    The report is actually a false positive, since the only existing lock
    nesting is the msk socket lock acquired by the mptcp work.
    cancel_work_sync() is invoked without the relevant socket lock being
    held, but under a different (the msk listener) socket lock.

    We could silence the splat adding a per workqueue dynamic lockdep key,
    but that looks overkill. Instead just tell lockdep the msk socket lock
    is not held around cancel_work_sync().

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
    Fixes: 30e51b923e43 ("mptcp: fix unreleased socket in accept queue")
    Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-01-24 15:54:38 +01:00
Davide Caratti 7f25c0d2df mptcp: fix deadlock in fastopen error path
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2161699
Upstream Status: net.git commit 7d803344fdc3

commit 7d803344fdc3e38079fabcf38b1e4cb6f8faa655
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Dec 20 11:52:14 2022 -0800

    mptcp: fix deadlock in fastopen error path

    MatM reported a deadlock at fastopening time:

    INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
          Tainted: G S                 6.1.0-rc5-03226-gdb0157db5153 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:syz-executor.0  state:D stack:25104 pid:11454 ppid:424    flags:0x00004006
    Call Trace:
     <TASK>
     context_switch kernel/sched/core.c:5191 [inline]
     __schedule+0x5c2/0x1550 kernel/sched/core.c:6503
     schedule+0xe8/0x1c0 kernel/sched/core.c:6579
     __lock_sock+0x142/0x260 net/core/sock.c:2896
     lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
     __mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
     mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
     mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
     __inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
     tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
     mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
     mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
     inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
     sock_sendmsg_nosec net/socket.c:714 [inline]
     sock_sendmsg+0xf7/0x190 net/socket.c:734
     ____sys_sendmsg+0x336/0x970 net/socket.c:2476
     ___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
     __sys_sendmmsg+0x18d/0x460 net/socket.c:2616
     __do_sys_sendmmsg net/socket.c:2645 [inline]
     __se_sys_sendmmsg net/socket.c:2642 [inline]
     __x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    RIP: 0033:0x7f5920a75e7d
    RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
    RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
    RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
    RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
     </TASK>

    In the error path, tcp_sendmsg_fastopen() ends-up calling
    mptcp_disconnect(), and the latter tries to close each
    subflow, acquiring the socket lock on each of them.

    At fastopen time, we have a single subflow, and such subflow
    socket lock is already held by the called, causing the deadlock.

    We already track the 'fastopen in progress' status inside the msk
    socket. Use it to address the issue, making mptcp_disconnect() a
    no op when invoked from the fastopen (error) path and doing the
    relevant cleanup after releasing the subflow socket lock.

    While at the above, rename the fastopen status bit to something
    more meaningful.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
    Fixes: fa9e57468aa1 ("mptcp: fix abba deadlock on fastopen")
    Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2023-01-24 15:54:38 +01:00
Davide Caratti e19c81cb55 mptcp: factor out mptcp_connect()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 54f1944ed6d2

commit 54f1944ed6d2554475f39a4921dc5422fa692c4f
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Oct 21 15:58:55 2022 -0700

    mptcp: factor out mptcp_connect()

    The current MPTCP connect implementation duplicates a bit of inet
    code and does not use nor provide a struct proto->connect callback,
    which in turn will not fit the upcoming fastopen implementation.

    Refactor such implementation to use the common helper, moving the
    MPTCP-specific bits into mptcp_connect(). Additionally, avoid an
    indirect call to the subflow connect callback.

    Note that the fastopen call-path invokes mptcp_connect() while already
    holding the subflow socket lock. Explicitly keep track of such path
    via a new MPTCP-level flag and handle the locking accordingly.

    Additionally, track the connect flags in a new msk field to allow
    propagating them to the subflow inet_stream_connect call.

    Fixes: d98a82a6afc7 ("mptcp: handle defer connect in mptcp_sendmsg")
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:11:00 +01:00
Davide Caratti dea7679002 mptcp: set msk local address earlier
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit e72e4032637f

commit e72e4032637f4646554794ac28a3abecc6c2416d
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Oct 21 15:58:54 2022 -0700

    mptcp: set msk local address earlier

    The mptcp_pm_nl_get_local_id() code assumes that the msk local address
    is available at that point. For passive sockets, we initialize such
    address at accept() time.

    Depending on the running configuration and the user-space timing, a
    passive MPJ subflow can join the msk socket before accept() completes.

    In such case, the PM assigns a wrong local id to the MPJ subflow
    and later PM netlink operations will end-up touching the wrong/unexpected
    subflow.

    All the above causes sporadic self-tests failures, especially when
    the host is heavy loaded.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/308
    Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
    Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints")
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:11:00 +01:00
Davide Caratti e58d9d6f36 mptcp: fix unreleased socket in accept queue
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 30e51b923e43

commit 30e51b923e436b631e8d5b77fa5e318c6b066dc7
Author: Menglong Dong <imagedong@tencent.com>
Date:   Tue Sep 27 12:31:58 2022 -0700

    mptcp: fix unreleased socket in accept queue

    The mptcp socket and its subflow sockets in accept queue can't be
    released after the process exit.

    While the release of a mptcp socket in listening state, the
    corresponding tcp socket will be released too. Meanwhile, the tcp
    socket in the unaccept queue will be released too. However, only init
    subflow is in the unaccept queue, and the joined subflow is not in the
    unaccept queue, which makes the joined subflow won't be released, and
    therefore the corresponding unaccepted mptcp socket will not be released
    to.

    This can be reproduced easily with following steps:

    1. create 2 namespace and veth:
       $ ip netns add mptcp-client
       $ ip netns add mptcp-server
       $ sysctl -w net.ipv4.conf.all.rp_filter=0
       $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
       $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
       $ ip link add red-client netns mptcp-client type veth peer red-server \
         netns mptcp-server
       $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
       $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
       $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
       $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
       $ ip -n mptcp-server link set red-server up
       $ ip -n mptcp-client link set red-client up

    2. configure the endpoint and limit for client and server:
       $ ip -n mptcp-server mptcp endpoint flush
       $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
       $ ip -n mptcp-client mptcp endpoint flush
       $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
       $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
         1 subflow

    3. listen and accept on a port, such as 9999. The nc command we used
       here is modified, which makes it use mptcp protocol by default.
       $ ip netns exec mptcp-server nc -l -k -p 9999

    4. open another *two* terminal and use each of them to connect to the
       server with the following command:
       $ ip netns exec mptcp-client nc 10.0.0.1 9999
       Input something after connect to trigger the connection of the second
       subflow. So that there are two established mptcp connections, with the
       second one still unaccepted.

    5. exit all the nc command, and check the tcp socket in server namespace.
       And you will find that there is one tcp socket in CLOSE_WAIT state
       and can't release forever.

    Fix this by closing all of the unaccepted mptcp socket in
    mptcp_subflow_queue_clean() with __mptcp_close().

    Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
    __mptcp_close() before they are released, so mptcp_sock_destruct(), which
    is used to clean the unaccepted mptcp socket, is not needed anymore.

    The selftests for mptcp is ran for this commit, and no new failures.

    Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
    Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets")
    Cc: stable@vger.kernel.org
    Reviewed-by: Jiang Biao <benbjiang@tencent.com>
    Reviewed-by: Mengen Sun <mengensun@tencent.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:59 +01:00
Davide Caratti 4844953951 mptcp: factor out __mptcp_close() without socket lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 26d3e21ce1aa

commit 26d3e21ce1aab6cb19069c510fac8e7474445b18
Author: Menglong Dong <imagedong@tencent.com>
Date:   Tue Sep 27 12:31:57 2022 -0700

    mptcp: factor out __mptcp_close() without socket lock

    Factor out __mptcp_close() from mptcp_close(). The caller of
    __mptcp_close() should hold the socket lock, and cancel mptcp work when
    __mptcp_close() returns true.

    This function will be used in the next commit.

    Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
    Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets")
    Cc: stable@vger.kernel.org
    Reviewed-by: Jiang Biao <benbjiang@tencent.com>
    Reviewed-by: Mengen Sun <mengensun@tencent.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:59 +01:00
Davide Caratti 67b5e84bd3 mptcp: add mptcp_for_each_subflow_safe helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 5efbf6f7f076

commit 5efbf6f7f076c67d733a09410180cc63a7f4d7bf
Author: Matthieu Baerts <matthieu.baerts@tessares.net>
Date:   Tue Sep 6 22:55:39 2022 +0200

    mptcp: add mptcp_for_each_subflow_safe helper

    Similar to mptcp_for_each_subflow(): this is clearer now that the _safe
    version is used in multiple places.

    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:58 +01:00
Davide Caratti 05d76a9f84 mptcp: do not queue data on closed subflows
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit c886d70286bf

commit c886d70286bf3ad411eb3d689328a67f7102c6ae
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Aug 4 17:21:26 2022 -0700

    mptcp: do not queue data on closed subflows

    Dipanjan reported a syzbot splat at close time:

    WARNING: CPU: 1 PID: 10818 at net/ipv4/af_inet.c:153
    inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
    Modules linked in: uio_ivshmem(OE) uio(E)
    CPU: 1 PID: 10818 Comm: kworker/1:16 Tainted: G           OE
    5.19.0-rc6-g2eae0556bb9d #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
    1.13.0-1ubuntu1.1 04/01/2014
    Workqueue: events mptcp_worker
    RIP: 0010:inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
    Code: 21 02 00 00 41 8b 9c 24 28 02 00 00 e9 07 ff ff ff e8 34 4d 91
    f9 89 ee 4c 89 e7 e8 4a 47 60 ff e9 a6 fc ff ff e8 20 4d 91 f9 <0f> 0b
    e9 84 fe ff ff e8 14 4d 91 f9 0f 0b e9 d4 fd ff ff e8 08 4d
    RSP: 0018:ffffc9001b35fa78 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 00000000002879d0 RCX: ffff8881326f3b00
    RDX: 0000000000000000 RSI: ffff8881326f3b00 RDI: 0000000000000002
    RBP: ffff888179662674 R08: ffffffff87e983a0 R09: 0000000000000000
    R10: 0000000000000005 R11: 00000000000004ea R12: ffff888179662400
    R13: ffff888179662428 R14: 0000000000000001 R15: ffff88817e38e258
    FS:  0000000000000000(0000) GS:ffff8881f5f00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020007bc0 CR3: 0000000179592000 CR4: 0000000000150ee0
    Call Trace:
     <TASK>
     __sk_destruct+0x4f/0x8e0 net/core/sock.c:2067
     sk_destruct+0xbd/0xe0 net/core/sock.c:2112
     __sk_free+0xef/0x3d0 net/core/sock.c:2123
     sk_free+0x78/0xa0 net/core/sock.c:2134
     sock_put include/net/sock.h:1927 [inline]
     __mptcp_close_ssk+0x50f/0x780 net/mptcp/protocol.c:2351
     __mptcp_destroy_sock+0x332/0x760 net/mptcp/protocol.c:2828
     mptcp_worker+0x5d2/0xc90 net/mptcp/protocol.c:2586
     process_one_work+0x9cc/0x1650 kernel/workqueue.c:2289
     worker_thread+0x623/0x1070 kernel/workqueue.c:2436
     kthread+0x2e9/0x3a0 kernel/kthread.c:376
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
     </TASK>

    The root cause of the problem is that an mptcp-level (re)transmit can
    race with mptcp_close() and the packet scheduler checks the subflow
    state before acquiring the socket lock: we can try to (re)transmit on
    an already closed ssk.

    Fix the issue checking again the subflow socket status under the
    subflow socket lock protection. Additionally add the missing check
    for the fallback-to-tcp case.

    Fixes: d5f49190de ("mptcp: allow picking different xmit subflows")
    Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:58 +01:00
Davide Caratti f01c2a76f0 mptcp: move subflow cleanup in mptcp_destroy_common()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit c0bf3c6aa444

commit c0bf3c6aa444a5ef44acc57ef6cfa53fd4fc1c9b
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Aug 4 17:21:25 2022 -0700

    mptcp: move subflow cleanup in mptcp_destroy_common()

    If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
    eBPF program, the MPTCP protocol ends-up leaking all the subflows:
    the related cleanup happens in __mptcp_destroy_sock() that is not
    invoked in such code path.

    Address the issue moving the subflow sockets cleanup in the
    mptcp_destroy_common() helper, which is invoked in every msk cleanup
    path.

    Additionally get rid of the intermediate list_splice_init step, which
    is an unneeded relic from the past.

    The issue is present since before the reported root cause commit, but
    any attempt to backport the fix before that hash will require a complete
    rewrite.

    Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
    Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
    Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:58 +01:00
Davide Caratti 03477f1ea9 mptcp: more accurate MPC endpoint tracking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 3ad14f54bd74

commit 3ad14f54bd7448384458e69f0183843f683ecce8
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Jul 11 12:16:32 2022 -0700

    mptcp: more accurate MPC endpoint tracking

    Currently the id accounting for the ID 0 subflow is not correct:
    at creation time we mark (correctly) as unavailable the endpoint
    id corresponding the MPC subflow source address, while at subflow
    removal time set as available the id 0.

    With this change we track explicitly the endpoint id corresponding
    to the MPC subflow so that we can mark it as available at removal time.
    Additionally this allow deleting the initial subflow via the NL PM
    specifying the corresponding endpoint id.

    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:57 +01:00
Davide Caratti 623cf23422 mptcp: introduce and use mptcp_pm_send_ack()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit f5360e9b314c

commit f5360e9b314caed58970e811ae80a4c351e2ce8a
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Jul 11 12:16:29 2022 -0700

    mptcp: introduce and use mptcp_pm_send_ack()

    The in-kernel PM has a bit of duplicate code related to ack
    generation. Create a new helper factoring out the PM-specific
    needs and use it in a couple of places.

    As a bonus, mptcp_subflow_send_ack() is not used anymore
    outside its own compilation unit and can become static.

    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:57 +01:00
Davide Caratti e969e60a59 mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit f7657ff4a709

commit f7657ff4a7097eaf5220776456b7d75eb09062cb
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Fri Jul 8 10:14:08 2022 -0700

    mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h

    Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to
    include/net/mptcp.h.

    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:56 +01:00
Davide Caratti 2a0bbd58c8 mptcp: netlink: issue MP_PRIO signals from userspace PMs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 892f396c8e68

commit 892f396c8e68faab7f76ff49cf39e9fbbeea4097
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Tue Jul 5 14:32:14 2022 -0700

    mptcp: netlink: issue MP_PRIO signals from userspace PMs

    This change updates MPTCP_PM_CMD_SET_FLAGS to allow userspace PMs
    to issue MP_PRIO signals over a specific subflow selected by
    the connection token, local and remote address+port.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/286
    Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:56 +01:00
Davide Caratti 9b97bad956 mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit a657430260e5

commit a657430260e5437df16004c8c317821d946b5ead
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Tue Jul 5 14:32:13 2022 -0700

    mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags

    When setting up a subflow's flags for sending MP_PRIO MPTCP options, the
    subflow socket lock was not held while reading and modifying several
    struct members that are also read and modified in mptcp_write_options().

    Acquire the subflow socket lock earlier and send the MP_PRIO ACK with
    that lock already acquired. Add a new variant of the
    mptcp_subflow_send_ack() helper to use with the subflow lock held.

    Fixes: 067065422f ("mptcp: add the outgoing MP_PRIO support")
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:56 +01:00
Davide Caratti 9349cf891c mptcp: fix race on unaccepted mptcp sockets
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2103906
Upstream Status: net-next.git commit 6aeed9045071

commit 6aeed9045071f2252ff4e98fc13d1e304f33e5b0
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Jun 27 18:02:40 2022 -0700

    mptcp: fix race on unaccepted mptcp sockets

    When the listener socket owning the relevant request is closed,
    it frees the unaccepted subflows and that causes later deletion
    of the paired MPTCP sockets.

    The mptcp socket's worker can run in the time interval between such delete
    operations. When that happens, any access to msk->first will cause an UaF
    access, as the subflow cleanup did not cleared such field in the mptcp
    socket.

    Address the issue explicitly traversing the listener socket accept
    queue at close time and performing the needed cleanup on the pending
    msk.

    Note that the locking is a bit tricky, as we need to acquire the msk
    socket lock, while still owning the subflow socket one.

    Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-07-05 11:27:03 +02:00
Davide Caratti f3ef7f2471 mptcp: fix shutdown vs fallback race
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2103906
Upstream Status: net-next.git commit d51991e2e314

commit d51991e2e31477853e5b9c1005ac617707077286
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Jun 27 18:02:38 2022 -0700

    mptcp: fix shutdown vs fallback race

    If the MPTCP socket shutdown happens before a fallback
    to TCP, and all the pending data have been already spooled,
    we never close the TCP connection.

    Address the issue explicitly checking for critical condition
    at fallback time.

    Fixes: 1e39e5a32ad7 ("mptcp: infinite mapping sending")
    Fixes: 0348c690ed37 ("mptcp: add the fallback check")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-07-05 11:27:03 +02:00
Davide Caratti 672f55fd67 mptcp: invoke MP_FAIL response when needed
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2103906
Upstream Status: net-next.git commit 76a13b315709

commit 76a13b315709b5b65a7b65caf9ede9a8a38d8930
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Mon Jun 27 18:02:37 2022 -0700

    mptcp: invoke MP_FAIL response when needed

    mptcp_mp_fail_no_response shouldn't be invoked on each worker run, it
    should be invoked only when MP_FAIL response timeout occurs.

    This patch refactors the MP_FAIL response logic.

    It leverages the fact that only the MPC/first subflow can gracefully
    fail to avoid unneeded subflows traversal: the failing subflow can
    be only msk->first.

    A new 'fail_tout' field is added to the subflow context to record the
    MP_FAIL response timeout and use such field to reliably share the
    timeout timer between the MP_FAIL event and the MPTCP socket close
    timeout.

    Finally, a new ack is generated to send out MP_FAIL notification as soon
    as we hit the relevant condition, instead of waiting a possibly unbound
    time for the next data packet.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/281
    Fixes: d9fb797046c5 ("mptcp: Do not traverse the subflow connection list without lock")
    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-07-05 11:27:02 +02:00
Davide Caratti c49ba01bc3 mptcp: Do not traverse the subflow connection list without lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2103906
Upstream Status: net-next.git commit d9fb797046c5

commit d9fb797046c596187b97a08ea88b954964cc2d33
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Wed May 18 15:04:45 2022 -0700

    mptcp: Do not traverse the subflow connection list without lock

    The MPTCP socket's conn_list (list of subflows) requires the socket lock
    to access. The MP_FAIL timeout code added such an access, where it would
    check the list of subflows both in timer context and (later) in workqueue
    context where the socket lock is held.

    Rather than check the list twice, remove the check in the timeout
    handler and only depend on the check in the workqueue. Also remove the
    MPTCP_FAIL_NO_RESPONSE flag, since mptcp_mp_fail_no_response() has
    insignificant overhead and can be checked on each worker run.

    Fixes: 49fa1919d6bc ("mptcp: reset subflow when MP_FAIL doesn't respond")
    Reported-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-07-05 11:27:01 +02:00
Davide Caratti 8ac58263ad mptcp: stop using the mptcp_has_another_subflow() helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2103906
Upstream Status: net-next.git commit 7b16871f9932

commit 7b16871f9932d8a371488d2967b033387870a747
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed May 18 15:04:43 2022 -0700

    mptcp: stop using the mptcp_has_another_subflow() helper

    The mentioned helper requires the msk socket lock, and the
    current callers don't own it nor can't acquire it, so the
    access is racy.

    All the current callers are really checking for infinite mapping
    fallback, and the latter condition is explicitly tracked by
    the relevant msk variable: we can safely remove the caller usage
    - and the caller itself.

    The issue is present since MP_FAIL implementation, but the
    fix only applies since the infinite fallback support, ence the
    somewhat unexpected fixes tag.

    Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
    Acked-and-tested-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-07-05 11:27:00 +02:00
Paolo Abeni d18f0d700a mptcp: Do TCP fallback on early DSS checksum failure
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2100072
Tested: LNST, Tier1
Conflicts: different context in subflow_check_data_avail() as \
	rhel-9 already has the upstream commit f8d4bcacff3b \
	("mptcp: infinite mapping receiving")

Upstream commit:
commit ae66fb2ba6c3dcaf8b9612b65aa949a1a4bed150
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Tue May 17 11:02:12 2022 -0700

    mptcp: Do TCP fallback on early DSS checksum failure

    RFC 8684 section 3.7 describes several opportunities for a MPTCP
    connection to "fall back" to regular TCP early in the connection
    process, before it has been confirmed that MPTCP options can be
    successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
    acknowledges the first received data packet with a regular TCP header
    (no MPTCP options), fallback is allowed.

    If the recipient of that first data packet finds a MPTCP DSS checksum
    error, this provides an opportunity to fail gracefully with a TCP
    fallback rather than resetting the connection (as might happen if a
    checksum failure were detected later).

    This commit modifies the checksum failure code to attempt fallback on
    the initial subflow of a MPTCP connection, only if it's a failure in the
    first data mapping. In cases where the peer initiates the connection,
    requests checksums, is the first to send data, and the peer is sending
    incorrect checksums (see
    https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
    the connection to proceed as TCP rather than reset.

    Fixes: dd8bcd1768 ("mptcp: validate the data checksum")
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-22 17:29:44 +02:00
Paolo Abeni 65330c619e mptcp: fix checksum byte order
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2100072
Tested: LNST, Tier1
Conflicts: context differences because rhel-9 already has
	the upstream commit 1e39e5a32ad7 ("mptcp: infinite mapping sending")

Upstream commit:
commit ba2c89e0ea74a904d5231643245753d77422e7f5
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue May 17 11:02:11 2022 -0700

    mptcp: fix checksum byte order

    The MPTCP code typecasts the checksum value to u16 and
    then converts it to big endian while storing the value into
    the MPTCP option.

    As a result, the wire encoding for little endian host is
    wrong, and that causes interoperabilty interoperability
    issues with other implementation or host with different endianness.

    Address the issue writing in the packet the unmodified __sum16 value.

    MPTCP checksum is disabled by default, interoperating with systems
    with bad mptcp-level csum encoding should cause fallback to TCP.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275
    Fixes: c5b39e26d0 ("mptcp: send out checksum for DSS")
    Fixes: 390b95a5fb ("mptcp: receive checksum for DSS")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-22 17:25:54 +02:00
Patrick Talbert 55b0bd82ad Merge: mptcp: better window sharing
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/888

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089885
Tested: LNST, Tier1 and vs bz reproducer

when the MPTCP tput is CPU bound, and the used links are much faster then the CPU limits, the MPTCP tput is unstable as the MPTCP-level congestion window sharing has currently some glitches: patch 1/5 ensures the sharing affects even the announced window, patch 3/5 and 4/5 takes care of concurrent announced window updates. The remaining patches add more MIBs counter for introspection's sake

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Davide Caratti <dcaratti@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-06-10 09:44:37 +02:00
Paolo Abeni e5694e1194 mptcp: never shrink offered window
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089885
Tested: LNST, Tier1

Upstream commit:
commit f3589be0c420a3137e5902d15705ced6a36f3f43
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed May 4 14:54:07 2022 -0700

    mptcp: never shrink offered window

    As per RFC, the offered MPTCP-level window should never shrink.
    While we currently track the right edge, we don't enforce the
    above constraint on the wire.
    Additionally, concurrent xmit on different subflows can end-up in
    erroneous right edge update.
    Address the above explicitly updating the announced window and
    protecting the update with an additional atomic operation (sic)

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-24 18:12:41 +02:00
Paolo Abeni 0a22740234 mptcp: fix subflow accounting on close
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076832
Tested: LNST, Tier1

Upstream commit:
commit 95d686517884a403412b000361cee2b08b2ed1e6
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu May 12 16:26:41 2022 -0700

    mptcp: fix subflow accounting on close

    If the PM closes a fully established MPJ subflow or the subflow
    creation errors out in it's early stage the subflows counter is
    not bumped accordingly.

    This change adds the missing accounting, additionally taking care
    of updating accordingly the 'accept_subflow' flag.

    Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-18 18:11:49 +02:00
Davide Caratti 514e615698 mptcp: netlink: allow userspace-driven subflow establishment
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 702c2f646d42

commit 702c2f646d42cfd9e31133d68a8283fea48fd810
Author: Florian Westphal <fw@strlen.de>
Date:   Tue May 3 19:38:56 2022 -0700

    mptcp: netlink: allow userspace-driven subflow establishment

    This allows userspace to tell kernel to add a new subflow to an existing
    mptcp connection.

    Userspace provides the token to identify the mptcp-level connection
    that needs a change in active subflows and the local and remote
    addresses of the new or the to-be-removed subflow.

    MPTCP_PM_CMD_SUBFLOW_CREATE requires the following parameters:
    { token, { loc_id, family, loc_addr4 | loc_addr6 }, { family, rem_addr4 |
    rem_addr6, rem_port }

    MPTCP_PM_CMD_SUBFLOW_DESTROY requires the following parameters:
    { token, { family, loc_addr4 | loc_addr6, loc_port }, { family, rem_addr4 |
    rem_addr6, rem_port }

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:28 +02:00
Davide Caratti 542d9d2786 mptcp: netlink: Add MPTCP_PM_CMD_REMOVE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit d9a4594edabf

commit d9a4594edabf125dc17dfd52acc722c3de1cb44c
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Tue May 3 19:38:54 2022 -0700

    mptcp: netlink: Add MPTCP_PM_CMD_REMOVE

    This change adds a MPTCP netlink command for issuing a
    REMOVE_ADDR signal for an address over the chosen MPTCP
    connection from a userspace path manager.

    The command requires the following parameters: {token, loc_id}.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:27 +02:00
Davide Caratti 8d930f33fc mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 9ab4807c84a4

commit 9ab4807c84a4aacfc9b4f79cc81254035e0ec361
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Tue May 3 19:38:52 2022 -0700

    mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE

    This change adds a MPTCP netlink interface for issuing
    ADD_ADDR advertisements over the chosen MPTCP connection from a
    userspace path manager.

    The command requires the following parameters:
    { token, { loc_id, family, daddr4 | daddr6 [, dport] } [, if_idx],
    flags[signal] }.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:27 +02:00
Davide Caratti 9e7e2733fe mptcp: read attributes of addr entries managed by userspace PMs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 8b20137012d9

commit 8b20137012d9e521736c040328f8979cf0a144d0
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Tue May 3 19:38:50 2022 -0700

    mptcp: read attributes of addr entries managed by userspace PMs

    This change introduces a parallel path in the kernel for retrieving
    the local id, flags, if_index for an addr entry in the context of
    an MPTCP connection that's being managed by a userspace PM. The
    userspace and in-kernel PM modes deviate in their procedures for
    obtaining this information.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:26 +02:00
Davide Caratti 2b73fbfe8a mptcp: handle local addrs announced by userspace PMs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 4638de5aefe5

commit 4638de5aefe56366726e5107a9da13ce5c84a1b7
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Tue May 3 19:38:49 2022 -0700

    mptcp: handle local addrs announced by userspace PMs

    This change adds an internal function to store/retrieve local
    addrs announced by userspace PM implementations to/from its kernel
    context. The function addresses the requirements of three scenarios:
    1) ADD_ADDR announcements (which require that a local id be
    provided), 2) retrieving the local id associated with an address,
    and also where one may need to be assigned, and 3) reissuance of
    ADD_ADDRs when there's a successful match of addr/id.

    The list of all stored local addr entries is held under the
    MPTCP sock structure. Memory for these entries is allocated from
    the sock option buffer, so the list of addrs is bounded by optmem_max.
    The list if not released via REMOVE_ADDR signals is ultimately
    freed when the sock is destructed.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:26 +02:00
Davide Caratti 181396ec56 mptcp: establish subflows from either end of connection
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 70c708e82606

commit 70c708e82606f842dc1bcf0943b8acae30c4088f
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Mon May 2 13:52:35 2022 -0700

    mptcp: establish subflows from either end of connection

    This change updates internal logic to permit subflows to be
    established from either the client or server ends of MPTCP
    connections. This symmetry and added flexibility may be
    harnessed by PM implementations running on either end in
    creating new subflows.

    The essence of this change lies in not relying on the
    "server_side" flag (which continues to be available if needed).

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:25 +02:00
Davide Caratti e114e91c29 mptcp: reflect remote port (not 0) in ANNOUNCED events
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit d1ace2d9abf3

commit d1ace2d9abf3eb5aaa91621050bfd02695721d18
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Mon May 2 13:52:34 2022 -0700

    mptcp: reflect remote port (not 0) in ANNOUNCED events

    Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP
    SHOULD attempt to connect to the specified address on the same port
    as the port that is already in use by the subflow on which the
    ADD_ADDR signal was sent.

    To facilitate that, this change reflects the specific remote port in
    use by that subflow in MPTCP_EVENT_ANNOUNCED events.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:25 +02:00
Davide Caratti 2bd08577c2 mptcp: bypass in-kernel PM restrictions for non-kernel PMs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 4d25247d3ae4

commit 4d25247d3ae486e6e4c59394487fd01523628234
Author: Kishen Maloor <kishen.maloor@intel.com>
Date:   Mon May 2 13:52:31 2022 -0700

    mptcp: bypass in-kernel PM restrictions for non-kernel PMs

    Current limits on the # of addresses/subflows must apply only to
    in-kernel PM managed sockets. Thus this change removes such
    restrictions on connections overseen by non-kernel (e.g. userspace)
    PMs. This change also ensures that the kernel does not record stats
    inside struct mptcp_pm_data updated along kernel code paths when exercised
    via non-kernel PMs.

    Additionally, address announcements are acknolwedged and subflow
    requests are honored only when it's deemed that a userspace path
    manager is active at the time.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:24 +02:00
Davide Caratti b941ea6459 mptcp: Add a per-namespace sysctl to set the default path manager type
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 6bb63ccc25d4

commit 6bb63ccc25d4a8cb8fe48efeda680cb13f84d1b0
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Wed Apr 27 15:50:01 2022 -0700

    mptcp: Add a per-namespace sysctl to set the default path manager type

    The new net.mptcp.pm_type sysctl determines which path manager will be
    used by each newly-created MPTCP socket.

    v2: Handle builds without CONFIG_SYSCTL
    v3: Clarify logic for type-specific PM init (Geliang Tang and Paolo Abeni)

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:23 +02:00
Davide Caratti a178c5aed6 mptcp: Bypass kernel PM when userspace PM is enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 14b06811bec6

commit 14b06811bec686af3dbba58141c23b06f3e385bd
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Wed Apr 27 15:49:59 2022 -0700

    mptcp: Bypass kernel PM when userspace PM is enabled

    When a MPTCP connection is managed by a userspace PM, bypass the kernel
    PM for incoming advertisements and subflow events. Netlink events are
    still sent to userspace.

    v2: Remove unneeded check in mptcp_pm_rm_addr_received() (Kishen Maloor)
    v3: Add and use helper function for PM mode (Paolo Abeni)

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:23 +02:00
Davide Caratti 4fc2da628e mptcp: Add a member to mptcp_pm_data to track kernel vs userspace mode
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit d85a8fde71e2

commit d85a8fde71e245981180698a5a662598682b7524
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Wed Apr 27 15:49:58 2022 -0700

    mptcp: Add a member to mptcp_pm_data to track kernel vs userspace mode

    When adding support for netlink path management commands, the kernel
    needs to know whether paths are being controlled by the in-kernel path
    manager or a userspace PM.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:22 +02:00
Davide Caratti d01b15acf3 mptcp: Remove redundant assignments in path manager init
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 9273b9d57995

commit 9273b9d5799598d35f7a2a2df61b8e29102aeac8
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Wed Apr 27 15:49:57 2022 -0700

    mptcp: Remove redundant assignments in path manager init

    A few members of the mptcp_pm_data struct were assigned to hard-coded
    values in mptcp_pm_data_reset(), and then immediately changed in
    mptcp_pm_nl_data_init().

    Instead, flatten all the assignments in to mptcp_pm_data_reset().

    v2: Resolve conflicts due to rename of mptcp_pm_data_reset()
    v4: Resolve conflict in mptcp_pm_data_reset()

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:22 +02:00
Davide Caratti ad8234f4bc mptcp: reset subflow when MP_FAIL doesn't respond
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 49fa1919d6bc

commit 49fa1919d6bcdcf3cf3d080c1943f537f6ed5e70
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Apr 26 14:57:15 2022 -0700

    mptcp: reset subflow when MP_FAIL doesn't respond

    This patch adds a new msk->flags bit MPTCP_FAIL_NO_RESPONSE, then reuses
    sk_timer to trigger a check if we have not received a response from the
    peer after sending MP_FAIL. If the peer doesn't respond properly, reset
    the subflow.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:21 +02:00
Davide Caratti ca1e9e8d32 mptcp: add MP_FAIL response support
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 9c81be0dbc89

commit 9c81be0dbc89ccb76ce34c3a88425bf3a0d57ebb
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Apr 26 14:57:14 2022 -0700

    mptcp: add MP_FAIL response support

    This patch adds a new struct member mp_fail_response_expect in struct
    mptcp_subflow_context to support MP_FAIL response. In the single subflow
    with checksum error and contiguous data special case, a MP_FAIL is sent
    in response to another MP_FAIL.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:21 +02:00
Davide Caratti b03c9fee68 mptcp: infinite mapping sending
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 1e39e5a32ad7

commit 1e39e5a32ad7fdd82d6e071aa14ecd511eedc1f7
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Fri Apr 22 14:55:39 2022 -0700

    mptcp: infinite mapping sending

    This patch adds the infinite mapping sending logic.

    Add a new flag send_infinite_map in struct mptcp_subflow_context. Set
    it true when a single contiguous subflow is in use and the
    allow_infinite_fallback flag is true in mptcp_pm_mp_fail_received().

    In mptcp_sendmsg_frag(), if this flag is true, call the new function
    mptcp_update_infinite_map() to set the infinite mapping.

    Add a new flag infinite_map in struct mptcp_ext, set it true in
    mptcp_update_infinite_map(), and check this flag in a new helper
    mptcp_check_infinite_map().

    In mptcp_update_infinite_map(), set data_len to 0, and clear the
    send_infinite_map flag, then do fallback.

    In mptcp_established_options(), use the helper mptcp_check_infinite_map()
    to let the infinite mapping DSS can be sent out in the fallback mode.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:20 +02:00
Davide Caratti 545bb78cf0 mptcp: track and update contiguous data status
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 0530020a7c8f

commit 0530020a7c8f2204e784f0dbdc882bbd961fdbde
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Fri Apr 22 14:55:38 2022 -0700

    mptcp: track and update contiguous data status

    This patch adds a new member allow_infinite_fallback in mptcp_sock,
    which is initialized to 'true' when the connection begins and is set
    to 'false' on any retransmit or successful MP_JOIN. Only do infinite
    mapping fallback if there is a single subflow AND there have been no
    retransmissions AND there have never been any MP_JOINs.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:19 +02:00
Davide Caratti 31fbe24cf2 mptcp: reset the packet scheduler on incoming MP_PRIO
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 43f5b111d1ff

commit 43f5b111d1ff16161ce60e19aeddb999cb6f0b01
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 8 12:45:55 2022 -0700

    mptcp: reset the packet scheduler on incoming MP_PRIO

    When an incoming MP_PRIO option changes the backup
    status of any subflow, we need to reset the packet
    scheduler status, or the next send could keep using
    the previously selected subflow, without taking in account
    the new priorities.

    Reported-by: Davide Caratti <dcaratti@redhat.com>
    Fixes: 40453a5c61 ("mptcp: add the incoming MP_PRIO support")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:03:17 +02:00
Davide Caratti e2fe3316fb mptcp: strict local address ID selection
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 4cf86ae84c71

commit 4cf86ae84c718333928fd2d43168a1e359a28329
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Mar 7 12:44:37 2022 -0800

    mptcp: strict local address ID selection

    The address ID selection for MPJ subflows created in response
    to incoming ADD_ADDR option is currently unreliable: it happens
    at MPJ socket creation time, when the local address could be
    unknown.

    Additionally, if the no local endpoint is available for the local
    address, a new dummy endpoint is created, confusing the user-land.

    This change refactor the code to move the address ID selection inside
    the rebuild_header() helper, when the local address eventually
    selected by the route lookup is finally known. If the address used
    is not mapped by any endpoint - and thus can't be advertised/removed
    pick the id 0 instead of allocate a new endpoint.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:02:18 +02:00
Davide Caratti bc7d02988e mptcp: don't save tcp data_ready and write space callbacks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 952382c648e5

commit 952382c648e5929b961137840e1c5f65cf0cbef1
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Feb 15 18:11:30 2022 -0800

    mptcp: don't save tcp data_ready and write space callbacks

    Assign the helpers directly rather than save/restore in the context
    structure.

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:10 +02:00
Davide Caratti bdc6206a14 mptcp: constify a bunch of of helpers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 90d930882139

commit 90d930882139f166ed2551205d6f6d8c50b656fb
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Feb 15 18:11:28 2022 -0800

    mptcp: constify a bunch of of helpers

    A few pm-related helpers don't touch arguments which lacking
    the const modifier, let's constify them.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:10 +02:00
Davide Caratti 05a1f39e8e mptcp: drop port parameter of mptcp_pm_add_addr_signal
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit af7939f390de

commit af7939f390de17bde4a10a3bf0e337627fb42591
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Feb 15 18:11:27 2022 -0800

    mptcp: drop port parameter of mptcp_pm_add_addr_signal

    Drop the port parameter of mptcp_pm_add_addr_signal() and reflect it to
    avoid passing too many parameters.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:09 +02:00
Davide Caratti d006a52562 mptcp: drop unused sk in mptcp_get_options
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 0799e21b5a76

commit 0799e21b5a76d9f14d8a8f024d0b6b9847ad1a03
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Tue Feb 15 18:11:25 2022 -0800

    mptcp: drop unused sk in mptcp_get_options

    The parameter 'sk' became useless since the code using it was dropped
    from mptcp_get_options() in the commit 8d548ea1dd15 ("mptcp: do not set
    unconditionally csum_reqd on incoming opt"). Let's drop it.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:09 +02:00
Davide Caratti 61a9c05e32 mptcp: Use struct_group() to avoid cross-field memset()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 63ec72bd5848

commit 63ec72bd58487935a2e40d2cdffe5c9498f1275e
Author: Kees Cook <keescook@chromium.org>
Date:   Thu Jan 20 23:39:35 2022 -0800

    mptcp: Use struct_group() to avoid cross-field memset()

    In preparation for FORTIFY_SOURCE performing compile-time and run-time
    field bounds checking for memcpy(), memmove(), and memset(), avoid
    intentionally writing across neighboring fields.

    Use struct_group() to capture the fields to be reset, so that memset()
    can be appropriately bounds-checked by the compiler.

    Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
    Cc: mptcp@lists.linux.dev
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Link: https://lore.kernel.org/r/20220121073935.1154263-1-keescook@chromium.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:03 +02:00
Davide Caratti 1bd92e2317 mptcp: change the parameter of __mptcp_make_csum
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit c312ee219100

commit c312ee219100e86143a1d3cc10b367bc43a0e0b8
Author: Geliang Tang <geliang.tang@suse.com>
Date:   Fri Jan 7 11:25:23 2022 -0800

    mptcp: change the parameter of __mptcp_make_csum

    This patch changed the type of the last parameter of __mptcp_make_csum()
    from __sum16 to __wsum. And export this function in protocol.h.

    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:02 +02:00
Davide Caratti f5f301ae9b mptcp: avoid atomic bit manipulation when possible
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit e9d09baca676

commit e9d09baca67625cfb41c0f2b547b9dbb4043ae95
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:26 2022 -0800

    mptcp: avoid atomic bit manipulation when possible

    Currently the msk->flags bitmask carries both state for the
    mptcp_release_cb() - mostly touched under the mptcp data lock
    - and others state info touched even outside such lock scope.

    As a consequence, msk->flags is always manipulated with
    atomic operations.

    This change splits such bitmask in two separate fields, so
    that we use plain bit operations when touching the
    cb-related info.

    The MPTCP_PUSH_PENDING bit needs additional care, as it is the
    only CB related field currently accessed either under the mptcp
    data lock or the mptcp socket lock.
    Let's add another mask just for such bit's sake.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:01 +02:00
Davide Caratti aec546e783 mptcp: cleanup MPJ subflow list handling
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 3e5014909b56

commit 3e5014909b5661b3da59990d72a317a45ba3b284
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:25 2022 -0800

    mptcp: cleanup MPJ subflow list handling

    We can simplify the join list handling leveraging the
    mptcp_release_cb(): if we can acquire the msk socket
    lock at mptcp_finish_join time, move the new subflow
    directly into the conn_list, otherwise place it on join_list and
    let the release_cb process such list.

    Since pending MPJ connection are now always processed
    in a timely way, we can avoid flushing the join list
    every time we have to process all the current subflows.

    Additionally we can now use the mptcp data lock to protect
    the join_list, removing the additional spin lock.

    Finally, the MPJ handshake is now always finalized under the
    msk socket lock, we can drop the additional synchronization
    between mptcp_finish_join() and mptcp_close().

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:01 +02:00
Davide Caratti fc542a468f mptcp: do not block subflows creation on errors
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit a88c9e496937

commit a88c9e49693759f9eb49dcda6c45a0d32b07634c
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:23 2022 -0800

    mptcp: do not block subflows creation on errors

    If the MPTCP configuration allows for multiple subflows
    creation, and the first additional subflows never reach
    the fully established status - e.g. due to packets drop or
    reset - the in kernel path manager do not move to the
    next subflow.

    This patch introduces a new PM helper to cope with MPJ
    subflow creation failure and delay and hook it where appropriate.

    Such helper triggers additional subflow creation, as needed
    and updates the PM subflow counter, if the current one is
    closing.

    Additionally start all the needed additional subflows
    as soon as the MPTCP socket is fully established, so we don't
    have to cope with slow MPJ handshake blocking the next subflow
    creation.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:00 +02:00
Davide Caratti 9b902002ba mptcp: keep track of local endpoint still available for each msk
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 86e39e04482b

commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:22 2022 -0800

    mptcp: keep track of local endpoint still available for each msk

    Include into the path manager status a bitmap tracking the list
    of local endpoints still available - not yet used - for the
    relevant mptcp socket.

    Keep such map updated at endpoint creation/deletion time, so
    that we can easily skip already used endpoint at local address
    selection time.

    The endpoint used by the initial subflow is lazyly accounted at
    subflow creation time: the usage bitmap is be up2date before
    endpoint selection and we avoid such unneeded task in some relevant
    scenarios - e.g. busy servers accepting incoming subflows but
    not creating any additional ones nor annuncing additional addresses.

    Overall this allows for fair local endpoints usage in case of
    subflow failure.

    As a side effect, this patch also enforces that each endpoint
    is used at most once for each mptcp connection.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:01:00 +02:00
Davide Caratti 4c3efd1830 mptcp: cleanup accept and poll
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit 71ba088ce0aa

commit 71ba088ce0aa87370b18a1d35cd742f352d51c24
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:17 2022 -0800

    mptcp: cleanup accept and poll

    After the previous patch,  msk->subflow will never be deleted during
    the whole msk lifetime. We don't need anymore to acquire references to
    it in mptcp_stream_accept() and we can use the listener subflow accept
    queue to simplify mptcp_poll() for listener socket.

    Overall this removes a lock pair and 4 more atomic operations per
    accept().

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:00:59 +02:00
Davide Caratti cc3a31187c mptcp: full disconnect implementation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit b29fcfb54cd7

commit b29fcfb54cd70caca5b11c80d8d238854938884a
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:16 2022 -0800

    mptcp: full disconnect implementation

    The current mptcp_disconnect() implementation lacks several
    steps, we additionally need to reset the msk socket state
    and flush the subflow list.

    Factor out the needed helper to avoid code duplication.

    Additionally ensure that the initial subflow is disposed
    only after mptcp_close(), just reset it at disconnect time.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:00:59 +02:00
Davide Caratti c27f599ab4 mptcp: implement fastclose xmit path
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2079368
Upstream Status: net-next.git commit f284c0c77321

commit f284c0c7732189fa77567dc061c5f4205c4fa05b
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Jan 6 16:20:15 2022 -0800

    mptcp: implement fastclose xmit path

    Allow the MPTCP xmit path to add MP_FASTCLOSE suboption
    on RST egress packets.

    Additionally reorder related options writing to reduce
    the number of conditionals required in the fast path.

    Co-developed-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-05-06 11:00:58 +02:00
Paolo Abeni c6b3d91594 mptcp: enforce HoL-blocking estimation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 3ce0852c86b926aed7bb8c69b09c5ad4ba0a9dfb
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Dec 17 15:37:00 2021 -0800

    mptcp: enforce HoL-blocking estimation

    The MPTCP packet scheduler has sub-optimal behavior with asymmetric
    subflows: if the faster subflow-level cwin is closed, the packet
    scheduler can enqueue "too much" data on a slower subflow.

    When all the data on the faster subflow is acked, if the mptcp-level
    cwin is closed, and link utilization becomes suboptimal.

    The solution is implementing blest-like[1] HoL-blocking estimation,
    transmitting only on the subflow with the shorter estimated time to
    flush the queued memory. If such subflows cwin is closed, we wait
    even if other subflows are available.

    This is quite simpler than the original blest implementation, as we
    leverage the pacing rate provided by the TCP socket. To get a more
    accurate estimation for the subflow linger-time, we maintain a
    per-subflow weighted average of such info.

    Additionally drop magic numbers usage in favor of newly defined
    macros and use more meaningful names for status variable.

    [1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:56 +01:00
Paolo Abeni 01ed424297 mptcp: support TCP_CORK and TCP_NODELAY
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 4f6e14bd19d6de7831f31cfb3210f2ea93eeb038
Author: Maxim Galaganov <max@internet.ru>
Date:   Fri Dec 3 14:35:41 2021 -0800

    mptcp: support TCP_CORK and TCP_NODELAY

    First, add cork and nodelay fields to the mptcp_sock structure
    so they can be used in sync_socket_options(), and fill them on setsockopt
    while holding the msk socket lock.

    Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows
    by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk
    while holding the ssk socket lock.

    tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared
    or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling
    mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending()
    being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle.

    Also add getsockopt support for TCP_CORK and TCP_NODELAY.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Maxim Galaganov <max@internet.ru>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:56 +01:00
Paolo Abeni 929b68eda3 mptcp: expose mptcp_check_and_set_pending
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 8b38217a2a98df6240c0cddb6f18d04923e24277
Author: Maxim Galaganov <max@internet.ru>
Date:   Fri Dec 3 14:35:40 2021 -0800

    mptcp: expose mptcp_check_and_set_pending

    Expose the mptcp_check_and_set_pending() function for use inside MPTCP
    sockopt code. The next patch will call it when TCP_CORK is cleared or
    TCP_NODELAY is set on the MPTCP socket in order to push pending data
    from mptcp_release_cb().

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Maxim Galaganov <max@internet.ru>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:56 +01:00
Paolo Abeni 622f033915 mptcp: add TCP_INQ cmsg support
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 2c9e77659a0c8d7ce96af3e420914ace1e3f7d21
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Dec 3 14:35:32 2021 -0800

    mptcp: add TCP_INQ cmsg support

    Support the TCP_INQ setsockopt.

    This is a boolean that tells recvmsg path to include the remaining
    in-sequence bytes in the cmsg data.

    v2: do not use CB(skb)->offset, increment map_seq instead (Paolo Abeni)
    v3: adjust CB(skb)->map_seq when taking skb from ofo queue (Paolo Abeni)

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/224
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:55 +01:00
Paolo Abeni a85e3b6259 mptcp: use delegate action to schedule 3rd ack retrans
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit bcd97734318d1d87bb237dbc0a60c81237b0ac50
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Nov 19 15:27:55 2021 +0100

    mptcp: use delegate action to schedule 3rd ack retrans

    Scheduling a delack in mptcp_established_options_mp() is
    not a good idea: such function is called by tcp_send_ack() and
    the pending delayed ack will be cleared shortly after by the
    tcp_event_ack_sent() call in __tcp_transmit_skb().

    Instead use the mptcp delegated action infrastructure to
    schedule the delayed ack after the current bh processing completes.

    Additionally moves the schedule_3rdack_retransmission() helper
    into protocol.c to avoid making it visible in a different compilation
    unit.

    Fixes: ec3edaa7ca ("mptcp: Add handling of outgoing MP_JOIN requests")
    Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:55 +01:00
Paolo Abeni 5c58bd5366 mptcp: allocate fwd memory separately on the rx and tx path
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 6511882cdd82d6cf2178932fa9b78647d130b860
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Oct 26 16:29:15 2021 -0700

    mptcp: allocate fwd memory separately on the rx and tx path

    All the mptcp receive path is protected by the msk socket
    spinlock. As consequences, the tx path has to play a few tricks to
    allocate the forward memory without acquiring the spinlock multiple
    times, making the overall TX path quite complex.

    This patch tries to clean-up a bit the tx path, using completely
    separated fwd memory allocation, for the rx and the tx path.

    The forward memory allocated in the rx path is now accounted in
    msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock.

    To cope with the above we provide a few MPTCP-specific variants for
    the helpers to charge, uncharge, reclaim and free the forward memory
    in the receive path.

    msk->sk_forward_alloc now accounts only the forward memory for the tx
    path, we can use the plain core sock helper to manipulate it and drop
    quite a bit of complexity.

    On memory pressure, both rx and tx fwd memories are reclaimed.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:55 +01:00
Paolo Abeni a49da5bfe2 mptcp: Make mptcp_pm_nl_mp_prio_send_ack() static
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 3828c514726fce7d97063155c4749eafefd9fbd2
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Fri Oct 15 16:05:52 2021 -0700

    mptcp: Make mptcp_pm_nl_mp_prio_send_ack() static

    This function is only used within pm_netlink.c now.

    Fixes: 067065422f ("mptcp: add the outgoing MP_PRIO support")
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:54 +01:00
Paolo Abeni 3fe57bf2c7 mptcp: remove tx_pending_data
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 9e65b6a5aaa3236488b4f4e3e8b914d73124a5a5
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Sep 24 14:12:37 2021 -0700

    mptcp: remove tx_pending_data

    The update on recovery is not correct.

    msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq;

    will update tx_pending_data multiple times when a subflow is declared
    stale while earlier recovery is still in progress.
    This means that tx_pending_data will still be positive even after
    all data as has been transmitted.

    Rather than fix it, remove this field: there are no consumers.
    The outstanding data byte count can be computed either via

     "msk->write_seq - rtx_head->data_seq" or
     "msk->write_seq - msk->snd_una".

    The latter is more recent/accurate estimate as rtx_head adjustment
    is deferred until mptcp lock can be acquired.

    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:54 +01:00
Paolo Abeni b4b248ecec mptcp: don't return sockets in foreign netns
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit ea1300b9df7c8e8b65695a08b8f6aaf4b25fec9c
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Sep 23 17:04:11 2021 -0700

    mptcp: don't return sockets in foreign netns

    mptcp_token_get_sock() may return a mptcp socket that is in
    a different net namespace than the socket that received the token value.

    The mptcp syncookie code path had an explicit check for this,
    this moves the test into mptcp_token_get_sock() function.

    Eventually token.c should be converted to pernet storage, but
    such change is not suitable for net tree.

    Fixes: 2c5ebd001d ("mptcp: refactor token container")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:49:53 +01:00
Paolo Abeni 8608d8e85a mptcp: Only send extra TCP acks in eligible socket states
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 340fa6667a696338e707cd5531a9631093d1be29
Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
Date:   Thu Sep 2 11:51:19 2021 -0700

    mptcp: Only send extra TCP acks in eligible socket states

    Recent changes exposed a bug where specifically-timed requests to the
    path manager netlink API could trigger a divide-by-zero in
    __tcp_select_window(), as syzkaller does:

    divide error: 0000 [#1] SMP KASAN NOPTI
    CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016
    Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea
    RSP: 0018:ffff888031ccf020 EFLAGS: 00010216
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000
    RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002
    RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000
    R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000
    R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000
    FS:  00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0
    PKRU: 55555554
    Call Trace:
     tcp_select_window net/ipv4/tcp_output.c:264 [inline]
     __tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351
     __tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972
     __tcp_send_ack net/ipv4/tcp_output.c:3978 [inline]
     tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978
     mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654
     mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58
     mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328
     mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359
     genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
     genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
     genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
     netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
     genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
     netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
     netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
     netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
     sock_sendmsg_nosec net/socket.c:704 [inline]
     sock_sendmsg+0x14e/0x190 net/socket.c:724
     ____sys_sendmsg+0x709/0x870 net/socket.c:2403
     ___sys_sendmsg+0xff/0x170 net/socket.c:2457
     __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the
    first subflow in the MPTCP socket's connection list without validating
    that the subflow was in a suitable connection state. To address this,
    always validate subflow state when sending extra ACKs on subflows
    for address advertisement or subflow priority change.

    Fixes: 84dfe3677a ("mptcp: send out dedicated ADD_ADDR packet")
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229
    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Acked-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-12 10:46:08 +01:00
Paolo Abeni ed12123a0b mptcp: Fix duplicated argument in protocol.h
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 780aa1209f88fd96d40572b62df922662f2b896d
Author: Wan Jiabing <wanjiabing@vivo.com>
Date:   Wed Sep 1 11:19:32 2021 +0800

    mptcp: Fix duplicated argument in protocol.h

    Fix the following coccicheck warning:
    ./net/mptcp/protocol.h:36:50-73: duplicated argument to & or |

    The OPTION_MPTCP_MPJ_SYNACK here is duplicate.
    Here should be OPTION_MPTCP_MPJ_ACK.

    Fixes: 74c7dfbee3e18 ("mptcp: consolidate in_opt sub-options fields in a bitmask")
    Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:56 +01:00
Paolo Abeni bf8ff9c2a4 mptcp: consolidate in_opt sub-options fields in a bitmask
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 74c7dfbee3e185b3c3a03f194e25689ed037fa3c
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Aug 26 17:44:52 2021 -0700

    mptcp: consolidate in_opt sub-options fields in a bitmask

    This makes input options processing more consistent with
    output ones and will simplify the next patch.

    Also avoid clearing the suboption field after processing
    it, since it's not needed.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:55 +01:00
Paolo Abeni 94b08087b8 mptcp: better binary layout for mptcp_options_received
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit a086aebae0ebe37e93ed8f6e686ca0d5c4375b44
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Thu Aug 26 17:44:51 2021 -0700

    mptcp: better binary layout for mptcp_options_received

    This change reorder the mptcp_options_received fields
    to shrink the structure a bit and to ensure the most
    frequently used fields are all in the first cacheline.

    Sub-opt specific flags are moved out of the suboptions area,
    and we must now explicitly set them when the relevant
    suboption is parsed.

    There is a notable exception: 'csum_reqd' is used by both DSS
    and MPC suboptions, and keeping such field in the suboptions
    flag area will simplfy the next patch.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:55 +01:00
Paolo Abeni 2939661e97 mptcp: send out MP_FAIL when data checksum fails
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 478d770008b03ed9d74bdc8add2315b7fd124ecc
Author: Geliang Tang <geliangtang@xiaomi.com>
Date:   Tue Aug 24 16:26:17 2021 -0700

    mptcp: send out MP_FAIL when data checksum fails

    When a bad checksum is detected, set the send_mp_fail flag to send out
    the MP_FAIL option.

    Add a new function mptcp_has_another_subflow() to check whether there's
    only a single subflow.

    When multiple subflows are in use, close the affected subflow with a RST
    that includes an MP_FAIL option and discard the data with the bad
    checksum.

    Set the sk_state of the subsocket to TCP_CLOSE, then the flag
    MPTCP_WORK_CLOSE_SUBFLOW will be set in subflow_sched_work_if_closed,
    and the subflow will be closed.

    When a single subfow is in use, temporarily handled by sending MP_FAIL
    with a RST too.

    Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:55 +01:00
Paolo Abeni af8a848af3 mptcp: MP_FAIL suboption receiving
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 5580d41b758af12134d5c6b4c385fc25d0c6bfb0
Author: Geliang Tang <geliangtang@xiaomi.com>
Date:   Tue Aug 24 16:26:16 2021 -0700

    mptcp: MP_FAIL suboption receiving

    This patch added handling for receiving MP_FAIL suboption.

    Add a new members mp_fail and fail_seq in struct mptcp_options_received.
    When MP_FAIL suboption is received, set mp_fail to 1 and save the sequence
    number to fail_seq.

    Then invoke mptcp_pm_mp_fail_received to deal with the MP_FAIL suboption.

    Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:54 +01:00
Paolo Abeni 5f671fad2f mptcp: MP_FAIL suboption sending
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit c25aeb4e095355eec3beb6a2b2b30322bd6d0dd4
Author: Geliang Tang <geliangtang@xiaomi.com>
Date:   Tue Aug 24 16:26:15 2021 -0700

    mptcp: MP_FAIL suboption sending

    This patch added the MP_FAIL suboption sending support.

    Add a new flag named send_mp_fail in struct mptcp_subflow_context. If
    this flag is set, send out MP_FAIL suboption.

    Add a new member fail_seq in struct mptcp_out_options to save the data
    sequence number to put into the MP_FAIL suboption.

    An MP_FAIL option could be included in a RST or on the subflow-level
    ACK.

    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:54 +01:00
Paolo Abeni 550e689522 mptcp: optimize out option generation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 1bff1e43a30e2f7500a49d47fd26a425643a6a37
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Aug 24 16:26:13 2021 -0700

    mptcp: optimize out option generation

    Currently we have several protocol constraints on MPTCP options
    generation (e.g. MPC and MPJ subopt are mutually exclusive)
    and some additional ones required by our implementation
    (e.g. almost all ADD_ADDR variant are mutually exclusive with
    everything else).

    We can leverage the above to optimize the out option generation:
    we check DSS/MPC/MPJ presence in a mutually exclusive way,
    avoiding many unneeded conditionals in the common cases.

    Additionally extend the existing constraints on ADD_ADDR opt on
    all subvariants, so that it becomes fully mutually exclusive with
    the above and we can skip another conditional statement for the
    common case.

    This change is also needed by the next patch.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:54 +01:00
Paolo Abeni 70877e6f2d mptcp: remove MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit c233ef13907038239303a73ca0565bcc3f3373bc
Author: Yonglong Li <liyonglong@chinatelecom.cn>
Date:   Mon Aug 23 18:05:43 2021 -0700

    mptcp: remove MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT

    MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT are not necessary, we can get
    these info from pm.local or pm.remote.

    Drop mptcp_pm_should_add_signal_ipv6 and mptcp_pm_should_add_signal_port
    too.

    Co-developed-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:54 +01:00
Paolo Abeni b438d0d51e mptcp: build ADD_ADDR/echo-ADD_ADDR option according pm.add_signal
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit f462a446384d0c00c6e457f7e8eb2053b095a2f1
Author: Yonglong Li <liyonglong@chinatelecom.cn>
Date:   Mon Aug 23 18:05:42 2021 -0700

    mptcp: build ADD_ADDR/echo-ADD_ADDR option according pm.add_signal

    According to the MPTCP_ADD_ADDR_SIGNAL or MPTCP_ADD_ADDR_ECHO flag, build
    the ADD_ADDR/ADD_ADDR_ECHO option.

    In mptcp_pm_add_addr_signal(), use opts->addr to save the announced
    ADD_ADDR or ADD_ADDR_ECHO address.

    Co-developed-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Geliang Tang <geliangtang@gmail.com>
    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:53 +01:00
Paolo Abeni 814472654a mptcp: make MPTCP_ADD_ADDR_SIGNAL and MPTCP_ADD_ADDR_ECHO separate
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 18fc1a922e2416998c5d37c26c69aab940c07ffb
Author: Yonglong Li <liyonglong@chinatelecom.cn>
Date:   Mon Aug 23 18:05:40 2021 -0700

    mptcp: make MPTCP_ADD_ADDR_SIGNAL and MPTCP_ADD_ADDR_ECHO separate

    Use MPTCP_ADD_ADDR_SIGNAL only for the action of sending ADD_ADDR, and
    use MPTCP_ADD_ADDR_ECHO only for the action of sending ADD_ADDR echo.

    Use msk->pm.local to save the announced ADD_ADDR address only, and reuse
    msk->pm.remote to save the announced ADD_ADDR_ECHO address.

    To prepare for the next patch.

    Co-developed-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:53 +01:00
Paolo Abeni dece3b886e mptcp: move drop_other_suboptions check under pm lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028420
Tested: LNST, Tier1

Upstream commit:
commit 1f5e9e2f5fd55fbf9b58ae6fefb021ad1c91b66a
Author: Yonglong Li <liyonglong@chinatelecom.cn>
Date:   Mon Aug 23 18:05:39 2021 -0700

    mptcp: move drop_other_suboptions check under pm lock

    This patch moved the drop_other_suboptions check from
    mptcp_established_options_add_addr() into mptcp_pm_add_addr_signal(), do
    it under the PM lock to avoid the race between this check and
    mptcp_pm_add_addr_signal().

    For this, added a new parameter for mptcp_pm_add_addr_signal() to get
    the drop_other_suboptions value. And drop the other suboptions after the
    option length check if drop_other_suboptions is true.

    Additionally, always drop the other suboption for TCP pure ack:
    that makes both the code simpler and the MPTCP behaviour more
    consistent.

    Co-developed-by: Geliang Tang <geliangtang@gmail.com>
    Signed-off-by: Geliang Tang <geliangtang@gmail.com>
    Co-developed-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-01-11 11:06:53 +01:00