Commit Graph

179 Commits

Author SHA1 Message Date
Michal Schmidt 555cb3d84d netdev_features: convert NETIF_F_LLTX to dev->lltx
JIRA: https://issues.redhat.com/browse/RHEL-59091

commit 00d066a4d4edbe559ba6c35153da71d4b2b8a383
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Thu Aug 29 14:33:37 2024 +0200

    netdev_features: convert NETIF_F_LLTX to dev->lltx

    NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
    rather an attribute, very similar to IFF_NO_QUEUE (and hot).
    Free one netdev_features_t bit and make it a "hot" private flag.

    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Conflicts:
	drivers/net/macsec.c
	drivers/net/veth.c
	net/ipv6/ip6_tunnel.c
	- Context.

	drivers/net/amt.c
	drivers/net/netkit.c
	- Non-existent in RHEL 9.

	drivers/net/ethernet/chelsio/cxgb/cxgb2.c
	drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
	- Drivers disabled in RHEL 9. Skipped.

	net/dsa/user.c
	- This is slave.c in RHEL 9, but CONFIG_NET_DSA is disabled,
	  so skipped the hunk.

	net/core/net-sysfs.c
	- Code not present because of missing commit 74293ea1c4db
	  ("net: sysfs: Do not create sysfs for non BQL device")

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
2024-10-03 17:59:44 +02:00
Lucas Zampieri f943c5e738 Merge: net: ease rtnl contention in cleanup paths
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4360

JIRA: https://issues.redhat.com/browse/RHEL-29681  
  
Signed-off-by: Antoine Tenart <atenart@redhat.com>

Approved-by: Florian Westphal <fwestpha@redhat.com>
Approved-by: Hangbin Liu <haliu@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-07-04 12:25:29 +00:00
Ivan Vecera a4a12f7632 ip_tunnel: convert __be16 tunnel flags to bitmaps
JIRA: https://issues.redhat.com/browse/RHEL-40130

Conflicts:
- hunk for non-existing net/ipv4/fou_bpf.c skipped
- conflict in ip_gre.c resolved in the same way as upstream merge
  commit cf1ca1f66d30 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") did
- simple context conflict ip_tunnel.c due to missing commit
  c4794d22251b9 ("ipv4: tunnels: use DEV_STATS_INC()")
- simple context conflict in ip6_gre.c and ip6_tunnel.c due to missing
  commit 2fad1ba354d4a ("ipv6: tunnels: use DEV_STATS_INC()")
- simple conflict in nft_tunnel.c due to missing ffb3d9a30cc67 ("netfilter:
  nf_tables: use correct integer types")

commit 5832c4a77d6931cebf9ba737129ae8f14b66ee1d
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Wed Mar 27 16:23:53 2024 +0100

    ip_tunnel: convert __be16 tunnel flags to bitmaps

    Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
    have been defined as __be16. Now all of those 16 bits are occupied
    and there's no more free space for new flags.
    It can't be simply switched to a bigger container with no
    adjustments to the values, since it's an explicit Endian storage,
    and on LE systems (__be16)0x0001 equals to
    (__be64)0x0001000000000000.
    We could probably define new 64-bit flags depending on the
    Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
    LE, but that would introduce an Endianness dependency and spawn a
    ton of Sparse warnings. To mitigate them, all of those places which
    were adjusted with this change would be touched anyway, so why not
    define stuff properly if there's no choice.

    Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
    value already coded and a fistful of <16 <-> bitmap> converters and
    helpers. The two flags which have a different bit position are
    SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
    __cpu_to_be16(), but as (__force __be16), i.e. had different
    positions on LE and BE. Now they both have strongly defined places.
    Change all __be16 fields which were used to store those flags, to
    IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
    unsigned long[1] for now, and replace all TUNNEL_* occurrences to
    their bitmap counterparts. Use the converters in the places which talk
    to the userspace, hardware (NFP) or other hosts (GRE header). The rest
    must explicitly use the new flags only. This must be done at once,
    otherwise there will be too many conversions throughout the code in
    the intermediate commits.
    Finally, disable the old __be16 flags for use in the kernel code
    (except for the two 'irregular' flags mentioned above), to prevent
    any accidental (mis)use of them. For the userspace, nothing is
    changed, only additions were made.

    Most noticeable bloat-o-meter difference (.text):

    vmlinux:        307/-1 (306)
    gre.ko:         62/0 (62)
    ip_gre.ko:      941/-217 (724)  [*]
    ip_tunnel.ko:   390/-900 (-510) [**]
    ip_vti.ko:      138/0 (138)
    ip6_gre.ko:     534/-18 (516)   [*]
    ip6_tunnel.ko:  118/-10 (108)

    [*] gre_flags_to_tnl_flags() grew, but still is inlined
    [**] ip_tunnel_find() got uninlined, hence such decrease

    The average code size increase in non-extreme case is 100-200 bytes
    per module, mostly due to sizeof(long) > sizeof(__be16), as
    %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
    are able to expand the majority of bitmap_*() calls here into direct
    operations on scalars.

    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-12 14:49:18 +02:00
Ivan Vecera 844e0ed599 ip_tunnel: use a separate struct to store tunnel params in the kernel
JIRA: https://issues.redhat.com/browse/RHEL-40130

Conflicts:
- context conflict due to __rh_deprecated_ndo_get_devlink_port() in
  net_device_ops that is RHEL specific

commit 117aef12a7b1b797bce9f66b156c65eab850b5b5
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Wed Mar 27 16:23:52 2024 +0100

    ip_tunnel: use a separate struct to store tunnel params in the kernel

    Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure
    to store params inside the kernel, IPv4 tunnel code uses the same
    ip_tunnel_parm which is being used to talk with the userspace.
    This makes it difficult to alter or add any fields or use a
    different format for whatever data.
    Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for
    now, and use it throughout the code. Define the pieces, where the copy
    user <-> kernel happens, as standalone functions, and copy the data
    there field-by-field, so that the kernel-side structure could be easily
    modified later on and the users wouldn't have to care about this.

    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:14 +02:00
Ivan Vecera d26c78ede2 net: Add helper function to parse netlink msg of ip_tunnel_parm
JIRA: https://issues.redhat.com/browse/RHEL-40130

commit b86fca800a6a3d439c454b462f7f067a18234e60
Author: Liu Jian <liujian56@huawei.com>
Date:   Thu Sep 29 21:52:03 2022 +0800

    net: Add helper function to parse netlink msg of ip_tunnel_parm

    Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
    Reduces duplicate code, no actual functional changes.

    Signed-off-by: Liu Jian <liujian56@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:04 +02:00
Ivan Vecera f3b878003f net: Add helper function to parse netlink msg of ip_tunnel_encap
JIRA: https://issues.redhat.com/browse/RHEL-40130

commit 537dd2d9fb9f4aa7939fb4fcf552ebe4f497bd7e
Author: Liu Jian <liujian56@huawei.com>
Date:   Thu Sep 29 21:52:02 2022 +0800

    net: Add helper function to parse netlink msg of ip_tunnel_encap

    Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
    Reduces duplicate code, no actual functional changes.

    Signed-off-by: Liu Jian <liujian56@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-11 11:22:01 +02:00
Antoine Tenart e4ae89b0e1 ip_tunnel: use exit_batch_rtnl() method
JIRA: https://issues.redhat.com/browse/RHEL-29681
Upstream Status: linux.git

commit 9b5b36374ed6953f3efcc82e7cb4c353b9869faf
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 6 14:43:10 2024 +0000

    ip_tunnel: use exit_batch_rtnl() method

    exit_batch_rtnl() is called while RTNL is held,
    and devices to be unregistered can be queued in the dev_kill_list.

    This saves one rtnl_lock()/rtnl_unlock() pair
    and one unregister_netdevice_many() call.

    This patch takes care of ipip, ip_vti, and ip_gre tunnels.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Antoine Tenart <atenart@kernel.org>
    Link: https://lore.kernel.org/r/20240206144313.2050392-15-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Antoine Tenart <atenart@redhat.com>
2024-05-28 15:24:06 +02:00
Ivan Vecera 526929e625 ip_tunnel: use ndo_siocdevprivate
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008927

commit 3e7a1c7c561ed8508fbdb98ed5708175bbcf7938
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Jul 27 15:45:06 2021 +0200

    ip_tunnel: use ndo_siocdevprivate

    The various ipv4 and ipv6 tunnel drivers each implement a set
    of 12 SIOCDEVPRIVATE commands for managing tunnels. These
    all work correctly in compat mode.

    Move them over to the new .ndo_siocdevprivate operation.

    Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
    Cc: David Ahern <dsahern@kernel.org>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2021-10-11 15:43:53 +02:00
Guillaume Nault 7ad136fd28 ipip: allow redirecting ipip and mplsip packets to eth devices
Even though ipip transports IPv4 or MPLS packets, it needs to reset the
mac_header pointer, so that other parts of the stack don't mistakenly
access the outer header after the packet has been decapsulated.

This allows to push an Ethernet header to ipip or mplsip packets and
redirect them to an Ethernet device:

  $ tc filter add dev ipip0 ingress matchall         \
      action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                           src_mac 00:00:5e:00:53:00 \
      action mirred egress redirect dev eth0

Without this patch, push_eth refuses to add an ethernet header because
the skb appears to already have a MAC header.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 12:44:17 -07:00
Heiner Kallweit 98d7fc4638 ipv4/ipv6: switch to dev_get_tstats64
Replace ip_tunnel_get_stats64() with the new identical core function
dev_get_tstats64().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09 17:50:28 -08:00
Jason A. Donenfeld e53ac93220 net: ipip: implement header_ops->parse_protocol for AF_PACKET
Ipip uses skb->protocol to determine packet type, and bails out if it's
not set. For AF_PACKET injection, we need to support its call chain of:

    packet_sendmsg -> packet_snd -> packet_parse_headers ->
      dev_parse_header_protocol -> parse_protocol

Without a valid parse_protocol, this returns zero, and ipip rejects the
skb. So, this wires up the ip_tunnel handler for layer 3 packets for
that case.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30 12:29:39 -07:00
David S. Miller 13209a8f73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24 13:47:27 -07:00
Vadim Fedorenko 57ebc8f085 net: ipip: fix wrong address family in init error path
In case of error with MPLS support the code is misusing AF_INET
instead of AF_MPLS.

Fixes: 1b69e7e6c4 ("ipip: support MPLS over IPv4")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21 17:24:25 -07:00
Christoph Hellwig 607259a695 net: add a new ndo_tunnel_ioctl method
This method is used to properly allow kernel callers of the IPv4 route
management ioctls.  The exsting ip_tunnel_ioctl helper is renamed to
ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
touching user memory, and is used for the guts of ndo_tunnel_ctl
implementations. A new ip_tunnel_ioctl helper is added that can be wired
up directly to the ndo_do_ioctl method and takes care of the copy to and
from userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19 15:45:11 -07:00
Haishuang Yan 47d858d0bd ipip: validate header length in ipip_tunnel_xmit
We need the same checks introduced by commit cb9f1b7838
("ip: validate header length on virtual device xmit") for
ipip tunnel.

Fixes: cb9f1b7838 ("ip: validate header length on virtual device xmit")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25 17:23:40 -07:00
Thomas Gleixner 2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
wenxu c8b34e680a ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit
Add tnl_update_pmtu in ip_md_tunnel_xmit to dynamic modify
the pmtu which packet send through collect_metadata mode
ip tunnel

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-26 09:43:03 -08:00
Stefano Brivio 32bbd8793f net: Convert protocol error handlers from void to int
We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Maciej Żenczykowski 1042caa79e net-ipv4: remove 2 always zero parameters from ipv4_redirect()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
Maciej Żenczykowski d888f39666 net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
David S. Miller e1ea2f9856 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts here.

NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.

Parallel additions to net/pkt_cls.h and net/sch_generic.h

A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.

The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30 21:09:24 +09:00
Xin Long f3594f0a7e ipip: only increase err_count for some certain type icmp in ipip_err
t->err_count is used to count the link failure on tunnel and an err
will be reported to user socket in tx path if t->err_count is not 0.
udp socket could even return EHOSTUNREACH to users.

Since commit fd58156e45 ("IPIP: Use ip-tunneling code.") removed
the 'switch check' for icmp type in ipip_err(), err_count would be
increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
failure would be reported out due to this.

In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
udp netperf failed with the err:
  send_data: data send error: No route to host (errno 113)

We expect this error reported from tunnel to socket when receiving
some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
or ICMP_PARAMETERPROB ones.

This patch is to bring 'switch check' for icmp type back to ipip_err
so that it only reports link failure for the right type icmp, just as
in ipgre_err() and ipip6_err().

Fixes: fd58156e45 ("IPIP: Use ip-tunneling code.")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27 23:43:31 +09:00
Eric Dumazet 64bc17811b ipv4: speedup ipv6 tunnels dismantle
Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real    1m38.965s
user    0m0.688s
sys     0m37.017s

After patch:
$ time ./add_del_unshare.sh
net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0

real	0m22.117s
user	0m0.728s
sys	0m35.328s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:24 -07:00
Matthias Schiffer a8b8a889e3 net: add netlink_ext_ack argument to rtnl_link_ops.validate
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer ad744b223c net: add netlink_ext_ack argument to rtnl_link_ops.changelink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer 7a3f4a1851 net: add netlink_ext_ack argument to rtnl_link_ops.newlink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:21 -04:00
Craig Gallek 9830ad4c6a ip_tunnel: Allow policy-based routing through tunnels
This feature allows the administrator to set an fwmark for
packets traversing a tunnel.  This allows the use of independent
routing tables for tunneled packets without the use of iptables.

There is no concept of per-packet routing decisions through IPv4
tunnels, so this implementation does not need to work with
per-packet route lookups as the v6 implementation may
(with IP6_TNL_F_USE_ORIG_FWMARK).

Further, since the v4 tunnel ioctls share datastructures
(which can not be trivially modified) with the kernel's internal
tunnel configuration structures, the mark attribute must be stored
in the tunnel structure itself and passed as a parameter when
creating or changing tunnel attributes.

Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 13:21:31 -04:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Alexey Dobriyan c7d03a00b5 netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 10:59:15 -05:00
Alexei Starovoitov cfc7381b30 ip_tunnel: add collect_md mode to IPIP tunnel
Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to
operate in 'collect metadata' mode.
bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away.
ovs can use it as well in the future (once appropriate ovs-vport
abstractions and user apis are added).
Note that just like in other tunnels we cannot cache the dst,
since tunnel_info metadata can be different for every packet.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 10:13:07 -04:00
Simon Horman 1b69e7e6c4 ipip: support MPLS over IPv4
Extend the IPIP driver to support MPLS over IPv4. The implementation is an
extension of existing support for IPv4 over IPv4 and is based of multiple
inner-protocol support for the SIT driver.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 17:45:56 -04:00
Tom Herbert 7e13318daa net: define gso types for IPx over IPv4 and IPv6
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 18:03:15 -04:00
Alexander Duyck aed069df09 ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb
This patch updates the IP tunnel core function iptunnel_handle_offloads so
that we return an int and do not free the skb inside the function.  This
actually allows us to clean up several paths in several tunnels so that we
can free the skb at one point in the path without having to have a
secondary path if we are supporting tunnel offloads.

In addition it should resolve some double-free issues I have found in the
tunnels paths as I believe it is possible for us to end up triggering such
an event in the case of fou or gue.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16 19:09:13 -04:00
Jiri Benc 7f290c9435 iptunnel: scrub packet in iptunnel_pull_header
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Edward Cree 6fa79666e2 net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads
All users now pass false, so we can remove it, and remove the code that
 was conditional upon it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:16 -05:00
David S. Miller c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Pravin B Shelar 6d3c348a63 ipip: ioctl: Remove superfluous IP-TTL handling.
IP-TTL case is already handled in ip_tunnel_ioctl() API.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 16:07:59 -05:00
Nikolay Aleksandrov dfc3b0e891 net: remove unnecessary mroute.h includes
It looks like many files are including mroute.h unnecessarily, so remove
the include. Most importantly remove it from ipv6.

CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:21 -05:00
Pravin B Shelar 2e15ea390e ip_gre: Add support to collect tunnel metadata.
Following patch create new tunnel flag which enable
tunnel metadata collection on given device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Eric Dumazet 252a8fbe81 ipip: fix one sparse error
make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/ipip.o
  CHECK   net/ipv4/ipip.c
net/ipv4/ipip.c:254:27: warning: incorrect type in assignment (different base types)
net/ipv4/ipip.c:254:27:    expected restricted __be32 [addressable] [usertype] o_key
net/ipv4/ipip.c:254:27:    got restricted __be16 [addressable] [usertype] i_flags

Fixes: 3b7b514f44 ("ipip: fix a regression in ioctl")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17 13:08:29 -04:00
Ian Morris 51456b2914 ipv4: coding style: comparison for equality with NULL
The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03 12:11:15 -04:00
Nicolas Dichtel 1e99584b91 ipip,gre,vti,sit: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:05:00 -04:00
Jiri Benc 67b61f6c13 netlink: implement nla_get_in_addr and nla_get_in6_addr
Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc 930345ea63 netlink: implement nla_put_in_addr and nla_put_in6_addr
IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Sabrina Dubroca 3e97fa7059 gre/ipip: use be16 variants of netlink functions
encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead
of nla_{get,put}_u16.

Fixes the sparse warnings:

warning: incorrect type in assignment (different base types)
   expected restricted __be32 [addressable] [usertype] o_key
   got restricted __be16 [addressable] [usertype] i_flags
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] sport
   got unsigned short
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] dport
   got unsigned short
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] sport
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] dport

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:28:06 -08:00
Nicolas Dichtel 1728d4fabd tunnels: advertise link netns via netlink
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:32:03 -05:00
Tom Herbert e1b2cb6550 fou: Fix typo in returning flags in netlink
When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 22:18:20 -05:00
Eric Dumazet 0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
Tom Herbert 077c5a0948 ipip: Set inner IP protocol in ipip
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Tom Herbert 473ab820dd ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation
Add netlink handling for IP tunnel encapsulation parameters and
and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
from ip_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00