Commit Graph

150 Commits

Author SHA1 Message Date
Rado Vrbovsky 05df4237af Merge: USB/TBT code rebase of supported drivers to upstream v6.11
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5592

JIRA: https://issues.redhat.com/browse/RHEL-59051

CVE: CVE-2024-44960
CVE JIRA: https://issues.redhat.com/browse/RHEL-57138

CVE: CVE-2024-46675
CVE JIRA: https://issues.redhat.com/browse/RHEL-64322

This MR rebases supported USB/TBT drivers to upstream kernel v6.11. By
design, changes on this rebase are limited to supported USB/Thunderbolt
drivers and infrastructure. Changes which happen to touch the drivers but
are tree-wide are selectively or partially pulled in, whenever relevant.

Notes:

I) Omits:

Omitted-fix: aefa036be8c2 ("phy: freescale: imx8qm-hsio: Include bitfield.h for FIELD_PREP")
Omitted-fix: 2d6213bd592b ("crypto: spacc - Add ifndef around MIN")
Omitted-fix: b8fc70ab7b5f ("Revert "crypto: spacc - Add SPAcc Skcipher support")
Omitted-fix: bf791751162a ("thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING")

II) This MR drops `rtsx_pci_ms` driver because it became dead code with
commit <c0e5f4e73a71> ("misc: rtsx: Add support for RTS5261"), which as
consequence was latter dropped on commit <d0f459259c13> ("memstick:
rtsx_pci_ms: Remove Realtek PCI memstick driver"). The latter is being
merged here.

III) This MR also includes minmax updates to fix these build and test errors:

1 - Signedness error:

```
drivers/usb/typec/ucsi/ucsi.c: In function 'ucsi_get_pd_message':
./include/linux/build_bug.h:78:41: error: static assertion failed: "min(bytes, (((con->ucsi)->version < 0x0200) ? 0x10 : 0xff)) signedness error, fix types or consider umin() before min_t()"
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
```

2 - ISO C90 error:

```
drivers/scsi/Makefile:196: FORCE prerequisite is missing
lib/vsprintf.c: In function 'resource_string':
lib/vsprintf.c:1068:9: error: ISO C90 forbids variable length array 'sym' [-Werror=vla]
 1068 |         char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
      |         ^~~~
```

3 - Oops on drm_gem_shmem CKI testing:

```
Unable to handle kernel paging request at virtual address ffffffff80000000
...
Internal error: Oops: 0000000096000146 [#1] SMP
...
drm_gem_shmem_test_obj_create_private+0x1cc/0x41c [drm_gem_shmem_test]
...
# drm_gem_shmem_test_obj_create_private: try faulted: last line seen drivers/gpu/drm/tests/drm_gem_shmem_test.c:120
# drm_gem_shmem_test_obj_create_private: internal error occurred preventing test case from running: -4
```

Signed-off-by: Desnes Nunes <desnesn@redhat.com>

Approved-by: José Ignacio Tornos Martínez <jtornosm@redhat.com>
Approved-by: Bastien Nocera <bnocera@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Eric Chanudet <echanude@redhat.com>
Approved-by: Adam Jackson <ajax@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-11-25 13:17:44 +00:00
Desnes Nunes 0e26cc08ef minmax: add a few more MIN_T/MAX_T users
JIRA: https://issues.redhat.com/browse/RHEL-59051

commit 4477b39c32fdc03363affef4b11d48391e6dc9ff
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 28 Jul 2024 13:03:48 -0700

  Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant
  expressions in VM code") added the simpler MIN_T/MAX_T macros in order
  to avoid some excessive expansion from the rather complicated regular
  min/max macros.

  The complexity of those macros stems from two issues:

   (a) trying to use them in situations that require a C constant
       expression (in static initializers and for array sizes)

   (b) the type sanity checking

  and MIN_T/MAX_T avoids both of these issues.

  Now, in the whole (long) discussion about all this, it was pointed out
  that the whole type sanity checking is entirely unnecessary for
  min_t/max_t which get a fixed type that the comparison is done in.

  But that still leaves min_t/max_t unnecessarily complicated due to
  worries about the C constant expression case.

  However, it turns out that there really aren't very many cases that use
  min_t/max_t for this, and we can just force-convert those.

  This does exactly that.

  Which in turn will then allow for much simpler implementations of
  min_t()/max_t().  All the usual "macros in all upper case will evaluate
  the arguments multiple times" rules apply.

  We should do all the same things for the regular min/max() vs MIN/MAX()
  cases, but that has the added complexity of various drivers defining
  their own local versions of MIN/MAX, so that needs another level of
  fixes first.

  Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
  Cc: David Laight <David.Laight@aculab.com>
  Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
  Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Desnes Nunes <desnesn@redhat.com>
2024-11-18 10:30:14 -03:00
Antoine Tenart cff87c7d6f inet: annotate devconf data-races
JIRA: https://issues.redhat.com/browse/RHEL-62202
Upstream Status: linux.git

commit 0598f8f3bb77893a13105d47bb7dfe42f1dc1f4e
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 27 09:24:09 2024 +0000

    inet: annotate devconf data-races

    Add READ_ONCE() in ipv4_devconf_get() and corresponding
    WRITE_ONCE() in ipv4_devconf_set()

    Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros,
    and use them when reading devconf fields.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240227092411.2315725-2-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-11-14 10:16:47 +01:00
Antoine Tenart 791e96333e net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
JIRA: https://issues.redhat.com/browse/RHEL-17413
Upstream Status: linux.git
Conflicts:\
- Context diff due to missing upstream commit 09eed1192cec ("neighbour:
  switch to standard rcu, instead of rcu_bh") in c9s.
- Context diff due to missing upstream commit cd3c74807736 ("ipv6:
  optimise dst refcounting on skb init") in c9s.

commit b4a11b2033b7d3dfdd46592f7036a775b18cecd1
Author: Heng Guo <heng.guo@windriver.com>
Date:   Thu Oct 19 09:20:53 2023 +0800

    net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.

    Reproduce environment:
    network with 3 VM linuxs is connected as below:
    VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
    VM1: eth0 ip: 192.168.122.207 MTU 1500
    VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
    VM3: eth0 ip: 192.168.123.240 MTU 1500

    Reproduce:
    VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0.
    scapy command:
    send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1,
    inter=1.000000)

    Result:
    Before IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates
    Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0
    ......
    ----------------------------------------------------------------------
    After IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates
    Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0
    ......
    ----------------------------------------------------------------------
    "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase
    from 7 to 8.

    Issue description and patch:
    IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS
    ("OutOctets") in ip_finish_output2().
    According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but
    not "OutRequests". "OutRequests" does not include any datagrams counted
    in "ForwDatagrams".
    ipSystemStatsOutOctets OBJECT-TYPE
        DESCRIPTION
               "The total number of octets in IP datagrams delivered to the
                lower layers for transmission.  Octets from datagrams
                counted in ipIfStatsOutTransmits MUST be counted here.
    ipSystemStatsOutRequests OBJECT-TYPE
        DESCRIPTION
               "The total number of IP datagrams that local IP user-
                protocols (including ICMP) supplied to IP in requests for
                transmission.  Note that this counter does not include any
                datagrams counted in ipSystemStatsOutForwDatagrams.
    So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add
    IPSTATS_MIB_OUTREQUESTS for "OutRequests".
    Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add
    IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.

    Test result with patch:
    Before IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates OutTransmits
    Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4
    ......
    root@qemux86-64:~# cat /proc/net/netstat
    ......
    IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
      OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
      InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
      InECT0Pkts InCEPkts ReasmOverlaps
    IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0
    ----------------------------------------------------------------------
    After IP data is sent.
    ----------------------------------------------------------------------
    root@qemux86-64:~# cat /proc/net/snmp
    Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
      ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
      OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
      FragOKs FragFails FragCreates OutTransmits
    Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5
    ......
    root@qemux86-64:~# cat /proc/net/netstat
    ......
    IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
      OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
      InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
      InECT0Pkts InCEPkts ReasmOverlaps
    IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0
    ----------------------------------------------------------------------
    "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3.
    "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.

    Signed-off-by: Heng Guo <heng.guo@windriver.com>
    Reviewed-by: Kun Song <Kun.Song@windriver.com>
    Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2023-12-11 11:15:48 +01:00
Jamie Bainbridge b615d9f4e7 icmp: Add counters for rate limits
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2155801
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

commit d0941130c93515411c8d66fc22bdae407b509a6d
Author: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date:   Wed Jan 25 11:16:52 2023 +1100

    icmp: Add counters for rate limits

    There are multiple ICMP rate limiting mechanisms:

    * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec
    * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask
    * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask

    However, when ICMP output is limited, there is no way to tell
    which limit has been hit or even if the limits are responsible
    for the lack of ICMP output.

    Add counters for each of the cases above. As we are within
    local_bh_disable(), use the __INC stats variant.

    Example output:

     # nstat -sz "*RateLimit*"
     IcmpOutRateLimitGlobal          134                0.0
     IcmpOutRateLimitHost            770                0.0
     Icmp6OutRateLimitHost           84                 0.0

    Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
    Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com>
    Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
2023-02-14 10:21:53 +10:00
Guillaume Nault 27ce10b0b3 ip: Fix data-races around sysctl_ip_default_ttl.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2149949
Upstream Status: linux.git
Conflicts: The drivers/net/ethernet/netronome/nfp/flower/action.c chunk
           was already backported by Centos Stream commit ab569013af.

commit 8281b7ec5c56b71cb2cc5a1728b41607be66959c
Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Wed Jul 13 13:51:51 2022 -0700

    ip: Fix data-races around sysctl_ip_default_ttl.

    While reading sysctl_ip_default_ttl, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Guillaume Nault <gnault@redhat.com>
2022-12-22 11:37:53 +01:00
Davide Caratti d2950bc221 tcp: switch orphan_count to bare per-cpu counters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2137858
Upstream Status: net.git commit 19757cebf0c5

commit 19757cebf0c5016a1f36f7fe9810a9f0b33c0832
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Oct 14 06:41:26 2021 -0700

    tcp: switch orphan_count to bare per-cpu counters

    Use of percpu_counter structure to track count of orphaned
    sockets is causing problems on modern hosts with 256 cpus
    or more.

    Stefan Bach reported a serious spinlock contention in real workloads,
    that I was able to reproduce with a netfilter rule dropping
    incoming FIN packets.

        53.56%  server  [kernel.kallsyms]      [k] queued_spin_lock_slowpath
                |
                ---queued_spin_lock_slowpath
                   |
                    --53.51%--_raw_spin_lock_irqsave
                              |
                               --53.51%--__percpu_counter_sum
                                         tcp_check_oom
                                         |
                                         |--39.03%--__tcp_close
                                         |          tcp_close
                                         |          inet_release
                                         |          inet6_release
                                         |          sock_close
                                         |          __fput
                                         |          ____fput
                                         |          task_work_run
                                         |          exit_to_usermode_loop
                                         |          do_syscall_64
                                         |          entry_SYSCALL_64_after_hwframe
                                         |          __GI___libc_close
                                         |
                                          --14.48%--tcp_out_of_resources
                                                    tcp_write_timeout
                                                    tcp_retransmit_timer
                                                    tcp_write_timer_handler
                                                    tcp_write_timer
                                                    call_timer_fn
                                                    expire_timers
                                                    __run_timers
                                                    run_timer_softirq
                                                    __softirqentry_text_start

    As explained in commit cf86a086a1 ("net/dst: use a smaller percpu_counter
    batch for dst entries accounting"), default batch size is too big
    for the default value of tcp_max_orphans (262144).

    But even if we reduce batch sizes, there would still be cases
    where the estimated count of orphans is beyond the limit,
    and where tcp_too_many_orphans() has to call the expensive
    percpu_counter_sum_positive().

    One solution is to use plain per-cpu counters, and have
    a timer to periodically refresh this cache.

    Updating this cache every 100ms seems about right, tcp pressure
    state is not radically changing over shorter periods.

    percpu_counter was nice 15 years ago while hosts had less
    than 16 cpus, not anymore by current standards.

    v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
        reported by kernel test robot <lkp@intel.com>
        Remove unused socket argument from tcp_too_many_orphans()

    Fixes: dd24c00191 ("net: Use a percpu_counter for orphan_count")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Stefan Bach <sfb@google.com>
    Cc: Neal Cardwell <ncardwell@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2022-11-08 17:10:54 +01:00
Kuniyuki Iwashima 55d444b310 tcp: Add stats for socket migration.
This commit adds two stats for the socket migration feature to evaluate the
effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE).

If the migration fails because of the own_req race in receiving ACK and
sending SYN+ACK paths, we do not increment the failure stat. Then another
CPU is responsible for the req.

Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:56:08 -07:00
Eric Dumazet 0d6cd689f9 net: proc: speedup /proc/net/netstat
Use cache friendly helpers to better use cpu caches
while reading /proc/net/netstat

Tested on a platform with 256 threads (AMD Rome)

Before: 305 usec spent in netstat_seq_show()
After: 130 usec spent in netstat_seq_show()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210128162145.1703601-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 20:59:53 -08:00
Menglong Dong a3ce2b109a net: udp: introduce UDP_MIB_MEMERRORS for udp_mem
When udp_memory_allocated is at the limit, __udp_enqueue_schedule_skb
will return a -ENOBUFS, and skb will be dropped in __udp_queue_rcv_skb
without any counters being done. It's hard to find out what happened
once this happen.

So we introduce a UDP_MIB_MEMERRORS to do this job. Well, this change
looks friendly to the existing users, such as netstat:

$ netstat -u -s
Udp:
    0 packets received
    639 packets to unknown port received.
    158689 packet receive errors
    180022 packets sent
    RcvbufErrors: 20930
    MemErrors: 137759
UdpLite:
IpExt:
    InOctets: 257426235
    OutOctets: 257460598
    InNoECTPkts: 181177

v2:
- Fix some alignment problems

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/1604627354-43207-1-git-send-email-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09 15:34:44 -08:00
Priyaranjan Jha ad2b9b0f8d tcp: skip DSACKs with dubious sequence ranges
Currently, we use length of DSACKed range to compute number of
delivered packets. And if sequence range in DSACK is corrupted,
we can get bogus dsacked/acked count, and bogus cwnd.

This patch put bounds on DSACKed range to skip update of data
delivery and spurious retransmission information, if the DSACK
is unlikely caused by sender's action:
- DSACKed range shouldn't be greater than maximum advertised rwnd.
- Total no. of DSACKed segments shouldn't be greater than total
  no. of retransmitted segs. Unlike spurious retransmits, network
  duplicates or corrupted DSACKs shouldn't be counted as delivery.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:15:45 -07:00
Priyaranjan Jha e3a5a1e8b6 tcp: add SNMP counter for no. of duplicate segments reported by DSACK
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv,
which are incremented depending on whether the DSACKed range is below
the cumulative ACK sequence number or not. Unfortunately, these both
implicitly assume each DSACK covers only one segment. This makes these
counters unusable for estimating spurious retransmit rates,
or real/non-spurious loss rate.

This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
the estimated number of duplicate segments based on:
(DSACKed sequence range) / MSS. This counter is usable for estimating
spurious retransmit rates, or real/non-spurious loss rate.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:54:30 -07:00
Florian Westphal fc518953bc mptcp: add and use MIB counter infrastructure
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:49 -07:00
Abdul Kabbani 32efcc06d2 tcp: export count for rehash attempts
Using IPv6 flow-label to swiftly route around avoid congested or
disconnected network path can greatly improve TCP reliability.

This patch adds SNMP counters and a OPT_STATS counter to track both
host-level and connection-level statistics. Network administrators
can use these counters to evaluate the impact of this new ability better.

Export count for rehash attempts to
1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts),
   and TcpDuplicateDataRehash (rehash due to receiving duplicate
   packets)
2) Timestamping API SOF_TIMESTAMPING_OPT_STATS.

Signed-off-by: Abdul Kabbani <akabbani@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-26 15:28:47 +01:00
David S. Miller 13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
Eric Dumazet f070ef2ac6 tcp: tcp_fragment() should apply sane memory limits
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.

TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.

A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.

Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.

CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
	socket is already using more than half the allowed space

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15 18:47:31 -07:00
David S. Miller a6cdeeb16b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07 11:00:14 -07:00
Jason Baron 9092a76d3c tcp: add backup TFO key infrastructure
We would like to be able to rotate TFO keys while minimizing the number of
client cookies that are rejected. Currently, we have only one key which can
be used to generate and validate cookies, thus if we simply replace this
key clients can easily have cookies rejected upon rotation.

We propose having the ability to have both a primary key and a backup key.
The primary key is used to generate as well as to validate cookies.
The backup is only used to validate cookies. Thus, keys can be rotated as:

1) generate new key
2) add new key as the backup key
3) swap the primary and backup key, thus setting the new key as the primary

We don't simply set the new key as the primary key and move the old key to
the backup slot because the ip may be behind a load balancer and we further
allow for the fact that all machines behind the load balancer will not be
updated simultaneously.

We make use of this infrastructure in subsequent patches.

Suggested-by: Igor Lubashev <ilubashe@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 13:41:26 -07:00
Thomas Gleixner 2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Eric Dumazet 4907abc605 net: dynamically allocate fqdir structures
Following patch will add rcu grace period before fqdir
rhashtable destruction, so we need to dynamically allocate
fqdir structures to not force expensive synchronize_rcu() calls
in netns dismantle path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26 14:08:05 -07:00
Eric Dumazet 803fdd9968 net: rename struct fqdir fields
Rename the @frags fields from structs netns_ipv4, netns_ipv6,
netns_nf_frag and netns_ieee802154_lowpan to @fqdir

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26 14:08:05 -07:00
Eric Dumazet 4f693b55c3 tcp: implement coalescing on backlog queue
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.

This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.

This also helps if we receive many ACK packets, since GRO
does not aggregate them.

This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:26:54 -08:00
Peter Oskolkov 7969e5c40d ip: discard IPv4 datagrams with overlapping segments.
This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 17:16:46 -07:00
Yafang Shao ea5d0c3249 tcp: add new SNMP counter for drops when try to queue in rcv queue
When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.

In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.

While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.

LINUX_MIB_TCPRCVQDROP:  Number of packets meant to be queued in rcv queue
			but dropped because socket rcvbuf limit hit.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 18:43:53 +09:00
Yafang Shao fb223502ec tcp: add SNMP counter for zero-window drops
It will be helpful if we could display the drops due to zero window or no
enough window space.
So a new SNMP MIB entry is added to track this behavior.
This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in
/proc/net/netstat in TcpExt line as TCPZeroWindowDrop.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 11:49:08 +09:00
Linus Torvalds 1c8c5a9d38 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add Maglev hashing scheduler to IPVS, from Inju Song.

 2) Lots of new TC subsystem tests from Roman Mashak.

 3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

 5) Add ttl inherit support to vxlan, from Hangbin Liu.

 6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

 8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

11) Plumb extack down into fib_rules, from Roopa Prabhu.

12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

13) Add UDP GSO support, from Willem de Bruijn.

14) Add documentation for eBPF helpers, from Quentin Monnet.

15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

24) Add TCP SACK compression, from Eric Dumazet.

25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

28) Add lots of forwarding selftests, from Petr Machata.

29) Add generic network device failover driver, from Sridhar Samudrala.

* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
  strparser: Add __strp_unpause and use it in ktls.
  rxrpc: Fix terminal retransmission connection ID to include the channel
  net: hns3: Optimize PF CMDQ interrupt switching process
  net: hns3: Fix for VF mailbox receiving unknown message
  net: hns3: Fix for VF mailbox cannot receiving PF response
  bnx2x: use the right constant
  Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
  net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
  enic: fix UDP rss bits
  netdev-FAQ: clarify DaveM's position for stable backports
  rtnetlink: validate attributes in do_setlink()
  mlxsw: Add extack messages for port_{un, }split failures
  netdevsim: Add extack error message for devlink reload
  devlink: Add extack to reload and port_{un, }split operations
  net: metrics: add proper netlink validation
  ipmr: fix error path when ipmr_new_table fails
  ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
  net: hns3: remove unused hclgevf_cfg_func_mta_filter
  netfilter: provide udp*_lib_lookup for nf_tproxy
  qed*: Utilize FW 8.37.2.0
  ...
2018-06-06 18:39:49 -07:00
Eric Dumazet 200d95f457 tcp: add TCPAckCompressed SNMP counter
This counter tracks number of ACK packets that the host has not sent,
thanks to ACK compression.

Sample output :

$ nstat -n;sleep 1;nstat|egrep "IpInReceives|IpOutRequests|TcpInSegs|TcpOutSegs|TcpExtTCPAckCompressed"
IpInReceives                    123250             0.0
IpOutRequests                   3684               0.0
TcpInSegs                       123251             0.0
TcpOutSegs                      3684               0.0
TcpExtTCPAckCompressed          119252             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Christoph Hellwig 3617d9496c proc: introduce proc_create_net_single
Variant of proc_create_data that directly take a seq_file show
callback and deals with network namespaces in ->open and ->release.
All callers of proc_create + single_open_net converted over, and
single_{open,release}_net are removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:24:30 +02:00
Yuchung Cheng feb5f2ec64 tcp: export packets delivery info
Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS

Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.

These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:16 -04:00
Eric Dumazet 3e67f106f6 inet: frags: break the 2GB limit for frags storage
Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh

<frag DDOS>

$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds                    3317150            0.0
IpReasmFails                    3317112            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:25:39 -04:00
Eric Dumazet 6befe4a78b inet: frags: remove some helpers
Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405 ("inet: frag: don't account number
of fragment queues")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:25:39 -04:00
Kirill Tkhai 2f635ceeb2 net: Drop pernet_operations::async
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:09 -04:00
Joe Perches d6444062f8 net: Use octal not symbolic permissions
Prefer the direct use of octal for permissions.

Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.

Miscellanea:

o Whitespace neatening around these conversions.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 12:07:48 -04:00
Stephen Hemminger 82695b30ff inet: whitespace cleanup
Ran simple script to find/remove trailing whitespace and blank lines
at EOF because that kind of stuff git whines about and editors leave
behind.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:43:28 -05:00
Kirill Tkhai f84c6821aa net: Convert pernet_subsys, registered from inet_init()
arp_net_ops just addr/removes /proc entry.

devinet_ops allocates and frees duplicate of init_net tables
and (un)registers sysctl entries.

fib_net_ops allocates and frees pernet tables, creates/destroys
netlink socket and (un)initializes /proc entries. Foreign
pernet_operations do not touch them.

ip_rt_proc_ops only modifies pernet /proc entries.

xfrm_net_ops creates/destroys /proc entries, allocates/frees
pernet statistics, hashes and tables, and (un)initializes
sysctl files. These are not touched by foreigh pernet_operations

xfrm4_net_ops allocates/frees private pernet memory, and
configures sysctls.

sysctl_route_ops creates/destroys sysctls.

rt_genid_ops only initializes fields of just allocated net.

ipv4_inetpeer_ops allocated/frees net private memory.

igmp_net_ops just creates/destroys /proc files and socket,
noone else interested in.

tcp_sk_ops seems to be safe, because tcp_sk_init() does not
depend on any other pernet_operations modifications. Iteration
over hash table in inet_twsk_purge() is made under RCU lock,
and it's safe to iterate the table this way. Removing from
the table happen from inet_twsk_deschedule_put(), but this
function is safe without any extern locks, as it's synchronized
inside itself. There are many examples, it's used in different
context. So, it's safe to leave tcp_sk_exit_batch() unlocked.

tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.

udplite4_net_ops only creates/destroys pernet /proc file.

icmp_sk_ops creates percpu sockets, not touched by foreign
pernet_operations.

ipmr_net_ops creates/destroys pernet fib tables, (un)registers
fib rules and /proc files. This seem to be safe to execute
in parallel with foreign pernet_operations.

af_inet_ops just sets up default parameters of newly created net.

ipv4_mib_ops creates and destroys pernet percpu statistics.

raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
and ip_proc_ops only create/destroy pernet /proc files.

ip4_frags_ops creates and destroys sysctl file.

So, it's safe to make the pernet_operations async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 10:36:08 -05:00
Alexey Dobriyan 96890d6252 net: delete /proc THIS_MODULE references
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

	-               if (de->proc_fops)
	-                       inode->i_fop = de->proc_fops;
	+               if (de->proc_fops) {
	+                       if (S_ISREG(inode->i_mode))
	+                               inode->i_fop = &proc_reg_file_ops;
	+                       else
	+                               inode->i_fop = de->proc_fops;
	+               }

VFS stopped pinning module at this point.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 15:01:33 -05:00
Yuchung Cheng 713bafea92 tcp: retire FACK loss detection
FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:16 +09:00
Florian Westphal 31770e34e4 tcp: Revert "tcp: remove header prediction"
This reverts commit 45f119bf93.

Eric Dumazet says:
  We found at Google a significant regression caused by
  45f119bf93 tcp: remove header prediction

  In typical RPC  (TCP_RR), when a TCP socket receives data, we now call
  tcp_ack() while we used to not call it.

  This touches enough cache lines to cause a slowdown.

so problem does not seem to be HP removal itself but the tcp_ack()
call.  Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30 11:20:09 -07:00
Florian Westphal 3282e65558 tcp: remove unused mib counters
was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:50 -07:00
Eric Dumazet 0604475119 tcp: add TCPMemoryPressuresChrono counter
DRAM supply shortage and poor memory pressure tracking in TCP
stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
limits) and tcp_mem[] quite hazardous.

TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
limits being hit, but only tracking number of transitions.

If TCP stack behavior under stress was perfect :
1) It would maintain memory usage close to the limit.
2) Memory pressure state would be entered for short times.

We certainly prefer 100 events lasting 10ms compared to one event
lasting 200 seconds.

This patch adds a new SNMP counter tracking cumulative duration of
memory pressure events, given in ms units.

$ cat /proc/sys/net/ipv4/tcp_mem
3088    4117    6176
$ grep TCP /proc/net/sockstat
TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
$ nstat -n ; sleep 10 ; nstat |grep Pressure
TcpExtTCPMemoryPressures        1700
TcpExtTCPMemoryPressuresChrono  5209

v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David
instructed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 11:26:19 -04:00
Wei Wang 46c2fa3987 net/tcp_fastopen: Add snmp counter for blackhole detection
This counter records the number of times the firewall blackhole issue is
detected and active TFO is disabled.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 14:27:17 -04:00
Soheil Hassas Yeganeh 4396e46187 tcp: remove tcp_tw_recycle
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Lutz Vieweg <lvml@5t9.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:33:56 -07:00
Eric Dumazet 8fe809a992 net: add LINUX_MIB_PFMEMALLOCDROP counter
Debugging issues caused by pfmemalloc is often tedious.

Add a new SNMP counter to more easily diagnose these problems.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 23:34:19 -05:00
Eric Dumazet c2a2efbbfc net: remove bh disabling around percpu_counter accesses
Shaohua Li made percpu_counter irq safe in commit 098faf5805
("percpu_counter: make APIs irq safe")

We can safely remove BH disable/enable sections around various
percpu_counter manipulations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 11:27:22 -05:00
Haishuang Yan 1946e672c1 ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:38:31 -05:00
Jia He 6d4a741cbb net: Suppress the "Comparison to NULL could be written" warnings
This is to suppress the checkpatch.pl warning "Comparison to NULL
could be written". No functional changes here.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:45 -04:00
Jia He f22d5c4909 proc: Reduce cache miss in snmp_seq_show
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.
Then snmp_seq_show is split into 2 parts to avoid build warning "the frame
size" larger than 1024.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:44 -04:00
Eric Dumazet 72145a68e4 tcp: md5: add LINUX_MIB_TCPMD5FAILURE counter
Adds SNMP counter for drops caused by MD5 mismatches.

The current syslog might help, but a counter is more precise and helps
monitoring.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-25 16:43:11 -07:00
Nikolay Borisov fa50d974d1 ipv4: Namespaceify ip_default_ttl sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:42:54 -05:00
Rick Jones b56ea2985d net: track success and failure of TCP PMTU probing
Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21 22:36:33 -07:00