linux-kernelorg-stable/net/ipv4
Martin KaFai Lau 2dbb9b9e6d bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT
This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.

BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
to store the bpf context instead of using the skb->cb[48].

At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
point,  it is not always clear where the bpf context can be appended
in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
clear if the lower layer is only ipv4 and ipv6 in the future and
will it not touch the cb[] again before transiting to the upper
layer.

For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
instead of IP[6]CB and it may still modify the cb[] after calling
the udp[46]_lib_lookup_skb().  Because of the above reason, if
sk->cb is used for the bpf ctx, saving-and-restoring is needed
and likely the whole 48 bytes cb[] has to be saved and restored.

Instead of saving, setting and restoring the cb[], this patch opts
to create a new "struct sk_reuseport_kern" and setting the needed
values in there.

The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
specific usage at this point and it is also inline with the current
sock_reuseport.c implementation (i.e. no protocol specific requirement).

In "struct sk_reuseport_md", this patch exposes data/data_end/len
with semantic similar to other existing usages.  Together
with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
the bpf prog that the reuseport group is bind-ed to a local
INANY address which cannot be learned from skb.

The new "bind_inany" is added to "struct sock_reuseport" which will be
used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
to avoid repeating the "bind INANY" test on
"sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
only be properly initialized when a "sk->sk_reuseport" enabled sk is
adding to a hashtable (i.e. during "reuseport_alloc()" and
"reuseport_add_sock()").

The new "sk_select_reuseport()" is the main helper that the
bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
the earlier patch, the validity of a selected sk is checked in
run time in "sk_select_reuseport()".  Doing the check in
verification time is difficult and inflexible (consider the map-in-map
use case).  The runtime check is to compare the selected sk's reuseport_id
with the reuseport_id that we want.  This helper will return -EXXX if the
selected sk cannot serve the incoming request (e.g. reuseport_id
not match).  The bpf prog can decide if it wants to do SK_DROP as its
discretion.

When the bpf prog returns SK_PASS, the kernel will check if a
valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
If it does , it will use the selected sk.  If not, the kernel
will select one from "reuse->socks[]" (as before this patch).

The SK_DROP and SK_PASS handling logic will be in the next patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-11 01:58:46 +02:00
..
bpfilter bpfilter: remove trailing newline 2018-07-24 14:10:42 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2018-07-20 22:28:28 -07:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
af_inet.c net: ipv4: Control SKB reprioritization after forwarding 2018-08-01 09:52:30 -07:00
ah4.c
arp.c
cipso_ipv4.c
datagram.c
devinet.c route: add support for directed broadcast forwarding 2018-07-29 12:37:06 -07:00
esp4.c
esp4_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
fib_frontend.c ipv4: remove BUG_ON() from fib_compute_spec_dst 2018-07-28 19:06:12 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c
fib_semantics.c net: metrics: add proper netlink validation 2018-06-05 12:29:43 -04:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
fou.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
gre_demux.c
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c ipv4: ipcm_cookie initializers 2018-07-07 10:58:49 +09:00
igmp.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
inet_connection_sock.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
inet_diag.c
inet_fragment.c ip: use rb trees for IP frag queue. 2018-08-05 17:16:46 -07:00
inet_hashtables.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
inet_timewait_sock.c
inetpeer.c
ip_forward.c net: ipv4: Control SKB reprioritization after forwarding 2018-08-01 09:52:30 -07:00
ip_fragment.c ipv4: frags: precedence bug in ip_expire() 2018-08-06 13:15:12 -07:00
ip_gre.c ip_gre: remove redundant variables t_hlen 2018-08-01 09:58:15 -07:00
ip_input.c net: ipv4: fix listify ip_rcv_finish in case of forwarding 2018-07-12 16:40:19 -07:00
ip_options.c
ip_output.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
ip_sockglue.c ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull 2018-07-24 16:35:58 -07:00
ip_tunnel.c ip_tunnel: Fix name string concatenate in __ip_tunnel_create() 2018-06-07 16:27:16 -04:00
ip_tunnel_core.c
ip_vti.c
ipcomp.c
ipconfig.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
ipip.c
ipmr.c net: ipmr: add support for passing full packet on wrong vif 2018-07-13 14:21:16 -07:00
ipmr_base.c rhashtable: split rhashtable.h 2018-06-22 13:43:27 +09:00
metrics.c net: metrics: add proper netlink validation 2018-06-05 12:29:43 -04:00
netfilter.c netfilter: utils: move nf_ip_checksum* from ipv4 to utils 2018-07-16 17:51:48 +02:00
netlink.c
ping.c net: add helpers checking if socket can be bound to nonlocal address 2018-08-01 09:50:04 -07:00
proc.c ip: discard IPv4 datagrams with overlapping segments. 2018-08-05 17:16:46 -07:00
protocol.c
raw.c ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 2018-07-07 10:58:49 +09:00
raw_diag.c
route.c route: add support for directed broadcast forwarding 2018-07-29 12:37:06 -07:00
syncookies.c
sysctl_net_ipv4.c net: ipv4: Notify about changes to ip_forward_update_priority 2018-08-01 09:52:30 -07:00
tcp.c tcp: remove unneeded variable 'err' 2018-08-03 16:52:07 -07:00
tcp_bbr.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
tcp_bic.c
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c tcp: do not delay ACK in DCTCP upon CE status change 2018-07-20 14:32:23 -07:00
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
tcp_ipv4.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux 2018-07-20 21:17:12 -07:00
tcp_lp.c
tcp_metrics.c
tcp_minisocks.c tcp: use monotonic timestamps for PAWS 2018-07-12 14:50:40 -07:00
tcp_nv.c
tcp_offload.c tcp: Don't coalesce decrypted and encrypted SKBs 2018-07-16 00:12:09 -07:00
tcp_output.c tcp: remove set but not used variable 'skb_size' 2018-08-01 09:57:09 -07:00
tcp_rate.c tcp: expose both send and receive intervals for rate sample 2018-07-11 23:01:56 -07:00
tcp_recovery.c tcp: add stat of data packet reordering events 2018-08-01 09:56:10 -07:00
tcp_scalable.c
tcp_timer.c tcp: make function tcp_retransmit_stamp() static 2018-07-25 16:35:45 -07:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
udp_diag.c udp: fix rx queue len reported by diag and proc interface 2018-06-08 19:55:15 -04:00
udp_impl.h
udp_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
udp_tunnel.c
udplite.c
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c