linux-kernelorg-stable/include/uapi/linux
Martin KaFai Lau 85d33df357 bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS
The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.

The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code).  For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
	refcount_t refcnt;
	enum bpf_struct_ops_state state;
	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
is created automatically by a macro.  Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem).  The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".

Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
   set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
   running kernel.
   Instead of reusing the attr->btf_value_type_id,
   btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
   used as the "user" btf which could store other useful sysadmin/debug
   info that may be introduced in the furture,
   e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
   in the running kernel btf.  Populate the value of this object.
   The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
   the map value.  The key is always "0".

During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem.  The map will not allow further update
from this point.

Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".

Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
have the prog _id_ populated as the func ptr.

The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)

The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
for the purose of tracking the subsystem usage.  Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage.  However, that will also tie down the
future semantics of map->refcnt and map->usercnt.

The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt.  When the very last subsystem's refcnt
is gone, it will also release the map->refcnt.  All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).

Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
        "value": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": 1,
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 2,
                "init": 24,
                "release": 0,
                "ssthresh": 25,
                "cong_avoid": 30,
                "set_state": 27,
                "cwnd_event": 28,
                "in_ack_event": 26,
                "undo_cwnd": 29,
                "pkts_acked": 0,
                "min_tso_segs": 0,
                "sndbuf_expand": 0,
                "cong_control": 0,
                "get_info": 0,
                "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                ],
                "owner": 0
            }
        }
    }
]

Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
  It does an inplace update on "*value" instead returning a pointer
  to syscall.c.  Otherwise, it needs a separate copy of "zero" value
  for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

* The bpf_struct_ops_map_delete_elem() is also called without
  preempt_disable() from map_delete_elem().  It is because
  the "->unreg()" may requires sleepable context, e.g.
  the "tcp_unregister_congestion_control()".

* "const" is added to some of the existing "struct btf_func_model *"
  function arg to avoid a compiler warning caused by this patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
..
android
byteorder
caif
can can: don't use deprecated license identifiers 2019-11-05 12:44:34 +01:00
cifs
dvb
genwqe
hdlc
hsi
iio
isdn
mmc
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2019-12-30 14:22:11 -08:00
netfilter_arp net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
netfilter_bridge net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
netfilter_ipv4 net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
netfilter_ipv6 net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
nfsd
raid
sched
spi
sunrpc
tc_act net: sched: add erspan option support to act_tunnel_key 2019-11-21 11:44:06 -08:00
tc_ematch
usb
wimax
a.out.h
acct.h
adb.h
adfs_fs.h
affs_hardblocks.h
agpgart.h
aio_abi.h
am437x-vpfe.h
apm_bios.h
arcfb.h
arm_sdei.h
aspeed-lpc-ctrl.h
aspeed-p2a-ctrl.h
atalk.h
atm.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_tcp.h
atm_zatm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
audit.h bpf: Emit audit messages upon successful prog load and unload 2019-12-11 17:41:09 +01:00
auto_dev-ioctl.h
auto_fs.h
auto_fs4.h
auxvec.h
ax25.h
b1lli.h
batadv_packet.h
batman_adv.h
baycom.h
bcache.h
bcm933xx_hcs.h
bfs_fs.h
binfmts.h
blkpg.h
blktrace_api.h
blkzoned.h block: add zone open, close and finish ioctl support 2019-11-07 06:31:50 -07:00
bpf.h bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS 2020-01-09 08:46:18 -08:00
bpf_common.h
bpf_perf_event.h
bpfilter.h
bpqether.h
bsg.h
bt-bmc.h
btf.h libbpf: Support libbpf-provided extern variables 2019-12-15 16:41:12 -08:00
btrfs.h btrfs: add incompat for raid1 with 3, 4 copies 2019-11-18 17:51:49 +01:00
btrfs_tree.h btrfs: add support for 4-copy replication (raid1c4) 2019-11-18 17:51:49 +01:00
can.h can: don't use deprecated license identifiers 2019-11-05 12:44:34 +01:00
capability.h
capi.h
cciss_defs.h
cciss_ioctl.h
cdrom.h
cec-funcs.h media: cec-funcs.h: use new CEC_OP_UI_CMD defines 2019-10-07 07:55:17 -03:00
cec.h media: cec: expose the new connector info API 2019-10-01 17:19:41 -03:00
cgroupstats.h
chio.h scsi: ch: add include guard to chio.h 2019-10-09 22:31:14 -04:00
cm4000_cs.h
cn_proc.h
coda.h
coff.h linux/coff.h: add include guard 2019-09-25 17:51:39 -07:00
connector.h
const.h
coresight-stm.h
cramfs_fs.h
cryptouser.h
cuda.h
cyclades.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
cycx_cfm.h
dcbnl.h net: Fix misspellings of "configure" and "configuration" 2019-10-28 13:41:01 -07:00
dccp.h
devlink.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 21:51:42 -08:00
dlm.h
dlm_device.h
dlm_netlink.h
dlm_plock.h
dlmconstants.h
dm-ioctl.h
dm-log-userspace.h
dma-buf.h
dn.h
dns_resolver.h
dqblk_xfs.h
edd.h
efs_fs_sb.h
elf-em.h
elf-fdpic.h
elf.h
elfcore.h y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
errno.h
errqueue.h y2038: socket: remove timespec reference in timestamping 2019-11-15 14:38:29 +01:00
erspan.h
ethtool.h ethtool: provide string sets with STRSET_GET request 2019-12-27 16:40:01 -08:00
ethtool_netlink.h ethtool: provide link state with LINKSTATE_GET request 2019-12-27 16:40:02 -08:00
eventpoll.h
fadvise.h
falloc.h
fanotify.h
fb.h
fcntl.h fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name 2019-10-25 14:28:10 -06:00
fd.h
fdreg.h
fib_rules.h
fiemap.h
filter.h
firewire-cdev.h
firewire-constants.h
fou.h
fpga-dfl.h
fs.h
fscrypt.h fscrypt: add support for IV_INO_LBLK_64 policies 2019-11-06 12:34:36 -08:00
fsi.h
fsl_hypervisor.h
fsmap.h
fsverity.h
fuse.h fuse: Add changelog entries for protocols 7.1 - 7.8 2019-10-23 14:26:37 +02:00
futex.h
gameport.h
gen_stats.h net_sched: add TCA_STATS_PKT64 attribute 2019-11-05 18:20:55 -08:00
genetlink.h
gfs2_ondisk.h
gigaset_dev.h
gpio.h gpio: add new SET_CONFIG ioctl() to gpio chardev 2019-11-12 16:30:31 +01:00
gsmmux.h
gtp.h
hash_info.h
hdlc.h
hdlcdrv.h
hdreg.h
hid.h
hiddev.h
hidraw.h
hpet.h
hsr_netlink.h
hw_breakpoint.h
hyperv.h
hysdn_if.h
i2c-dev.h
i2c.h
i2o-dev.h
i8k.h
icmp.h
icmpv6.h
if.h net: rtnetlink: add linkprop commands to add and delete alternative ifnames 2019-10-01 14:47:19 -07:00
if_addr.h
if_addrlabel.h
if_alg.h
if_arcnet.h
if_arp.h
if_bonding.h bonding: rename AD_STATE_* to LACP_STATE_* 2019-12-26 13:09:37 -08:00
if_bridge.h net: bridge: add STP xstats 2019-12-14 20:02:36 -08:00
if_cablemodem.h
if_eql.h
if_ether.h
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h rtnetlink: provide permanent hardware address in RTM_NEWLINK 2019-12-12 17:07:05 -08:00
if_ltalk.h
if_macsec.h
if_packet.h
if_phonet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h
if_slip.h
if_team.h
if_tun.h
if_tunnel.h
if_vlan.h
if_x25.h
if_xdp.h
ife.h
igmp.h
ila.h
in.h
in6.h
in_route.h
inet_diag.h
inotify.h
input-event-codes.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-12-07 18:33:01 -08:00
input.h
io_uring.h io_uring: ensure we return -EINVAL on unknown opcode 2019-12-11 16:02:32 -07:00
ioctl.h
iommu.h iommu: Introduce guest PASID bind function 2019-10-15 13:34:43 +02:00
ip.h
ip6_tunnel.h
ip_vs.h
ipc.h
ipmi.h
ipmi_bmc.h
ipmi_msgdefs.h
ipsec.h
ipv6.h
ipv6_route.h
ipx.h
irqnr.h
iso_fs.h
isst_if.h
ivtv.h
ivtvfb.h
jffs2.h
joystick.h
kcm.h
kcmp.h
kcov.h kcov: remote coverage support 2019-12-04 19:44:14 -08:00
kd.h
kdev_t.h
kernel-page-flags.h
kernel.h
kernelcapi.h
kexec.h
keyboard.h
keyctl.h
kfd_ioctl.h
kvm.h KVM: PPC: Book3S HV: Support reset of secure guest 2019-11-28 17:02:31 +11:00
kvm_para.h
l2tp.h
libc-compat.h
lightnvm.h
limits.h
lirc.h
llc.h
loop.h
lp.h
lwtunnel.h lwtunnel: add options setting and dumping for erspan 2019-11-06 21:14:22 -08:00
magic.h powerpc/pseries/cmm: Implement balloon compaction 2019-11-13 16:58:01 +11:00
major.h
map_to_7segment.h
matroxfb.h
max2175.h
mdio.h
media-bus-format.h
media.h
mei.h
membarrier.h
memfd.h
mempolicy.h
meye.h
mic_common.h
mic_ioctl.h
mii.h mii: Add helpers for parsing SGMII auto-negotiation 2020-01-05 23:22:32 -08:00
minix_fs.h
mman.h
mmtimer.h
module.h
mount.h
mpls.h
mpls_iptunnel.h
mqueue.h
mroute.h
mroute6.h
msdos_fs.h
msg.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
mtio.h
n_r3964.h
nbd-netlink.h
nbd.h
ncsi.h
ndctl.h
neighbour.h
net.h
net_dropmon.h
net_namespace.h
net_tstamp.h net: Introduce peer to peer one step PTP time stamping. 2019-12-25 19:51:34 -08:00
netconf.h
netdevice.h
netfilter.h
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h
netfilter_ipv6.h
netlink.h
netlink_diag.h
netrom.h
nexthop.h
nfc.h
nfs.h
nfs2.h
nfs3.h
nfs4.h
nfs4_mount.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfsacl.h
nilfs2_api.h
nilfs2_ondisk.h
nl80211.h mac80211: Turn AQL into an NL80211_EXT_FEATURE 2019-12-13 10:34:04 +01:00
nsfs.h
nubus.h
nvme_ioctl.h nvme: change nvme_passthru_cmd64 to explicitly mark rsvd 2019-11-06 06:17:38 +09:00
nvram.h
omap3isp.h
omapfb.h
oom.h
openvswitch.h openvswitch: New MPLS actions for layer 2 tunnelling 2019-12-24 22:24:45 -08:00
packet_diag.h
param.h
parport.h
patchkey.h
pci.h
pci_regs.h Merge branch 'pci/resource' 2019-11-28 08:54:36 -06:00
pcitest.h
perf_event.h perf/aux: Allow using AUX data in perf samples 2019-11-13 11:06:14 +01:00
personality.h
pfkeyv2.h
pg.h block: pg: add header include guard 2019-10-02 20:32:27 -06:00
phantom.h
phonet.h
pkt_cls.h net: sched: allow flower to match erspan options 2019-11-21 11:44:06 -08:00
pkt_sched.h net: sch_ets: Add a new Qdisc 2019-12-18 13:32:29 -08:00
pktcdvd.h
pmu.h
poll.h
posix_acl.h
posix_acl_xattr.h
posix_types.h
ppdev.h
ppp-comp.h
ppp-ioctl.h compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t 2019-10-23 17:23:47 +02:00
ppp_defs.h y2038: syscall implementation cleanups 2019-12-01 14:00:59 -08:00
pps.h
pr.h
prctl.h
psample.h
psci.h
psp-sev.h crypto: ccp - Retry SEV INIT command in case of integrity check failure. 2019-10-26 02:09:58 +11:00
ptp_clock.h ptp: Introduce strict checking of external time stamp options. 2019-11-15 12:48:32 -08:00
ptrace.h
qemu_fw_cfg.h
qnx4_fs.h
qnxtypes.h
qrtr.h
quota.h
radeonfb.h
random.h
raw.h
rds.h
reboot.h
reiserfs_fs.h
reiserfs_xattr.h
resource.h y2038: rusage: use __kernel_old_timeval 2019-11-15 14:38:29 +01:00
rfkill.h
rio_cm_cdev.h
rio_mport_cdev.h
romfs_fs.h
rose.h
route.h
rpmsg.h
rseq.h
rtc.h
rtnetlink.h net: rtnetlink: add linkprop commands to add and delete alternative ifnames 2019-10-01 14:47:19 -07:00
rxrpc.h
scc.h linux/scc.h: make uapi linux/scc.h self-contained 2019-12-04 19:44:12 -08:00
sched.h threads-v5.5 2019-11-25 18:36:49 -08:00
scif_ioctl.h
screen_info.h
sctp.h sctp: add SCTP_PEER_ADDR_THLDS_V2 sockopt 2019-11-08 14:18:32 -08:00
sdla.h
seccomp.h seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE 2019-10-28 12:29:46 -07:00
securebits.h
sed-opal.h block: sed-opal: Add support to read/write opal tables generically 2019-11-04 07:11:31 -07:00
seg6.h
seg6_genl.h
seg6_hmac.h
seg6_iptunnel.h
seg6_local.h
selinux_netlink.h
sem.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
serial.h
serial_core.h serial: fsl_linflexuart: Be consistent with the name 2019-10-16 06:11:24 -07:00
serial_reg.h
serio.h
shm.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
signal.h
signalfd.h
smc.h
smc_diag.h
smiapp.h
snmp.h net/tls: add TlsDeviceRxResync statistic 2019-10-05 16:29:00 -07:00
sock_diag.h
socket.h
sockios.h
sonet.h
sonypi.h
sound.h
soundcard.h
stat.h statx: define STATX_ATTR_VERITY 2019-11-13 12:15:34 -08:00
stddef.h
stm.h
string.h
suspend_ioctls.h
swab.h
switchtec_ioctl.h
sync_file.h
synclink.h
sysctl.h
sysinfo.h
target_core_user.h
taskstats.h
tcp.h net: Add device index to tcp_md5sig 2020-01-02 15:51:22 -08:00
tcp_metrics.h
tee.h
termios.h
thermal.h
time.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
time_types.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
timerfd.h
times.h
timex.h
tiocl.h
tipc.h tipc: add new AEAD key structure for user API 2019-11-08 14:01:59 -08:00
tipc_config.h net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
tipc_netlink.h tipc: make legacy address flag readable over netlink 2019-12-20 21:18:42 -08:00
tipc_sockets_diag.h
tls.h
toshiba.h
tty.h
tty_flags.h
types.h
udf_fs_i.h
udmabuf.h
udp.h
uhid.h
uinput.h
uio.h
uleds.h
ultrasound.h
un.h
unistd.h
unix_diag.h
usbdevice_fs.h
usbip.h
userfaultfd.h
userio.h
utime.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
utsname.h
uuid.h
uvcvideo.h
v4l2-common.h
v4l2-controls.h media: add V4L2_CID_UNIT_CELL_SIZE control 2019-10-10 11:37:26 -03:00
v4l2-dv-timings.h
v4l2-mediabus.h
v4l2-subdev.h
vbox_err.h
vbox_vmmdev_types.h
vboxguest.h
veth.h
vfio.h
vfio_ccw.h
vhost.h
vhost_types.h
videodev2.h media: v4l2_core: Add p_area to struct v4l2_ext_control 2019-11-08 07:42:25 +01:00
virtio_9p.h
virtio_balloon.h
virtio_blk.h
virtio_config.h
virtio_console.h
virtio_crypto.h
virtio_fs.h
virtio_gpu.h
virtio_ids.h
virtio_input.h
virtio_iommu.h
virtio_mmio.h
virtio_net.h
virtio_pci.h
virtio_pmem.h
virtio_ring.h net, uapi: fix -Wpointer-arith warnings 2019-10-04 14:25:17 -07:00
virtio_rng.h
virtio_scsi.h
virtio_types.h
virtio_vsock.h
vm_sockets.h vsock: add VMADDR_CID_LOCAL definition 2019-12-11 15:01:23 -08:00
vm_sockets_diag.h
vmcore.h
vsockmon.h
vt.h
vtpm_proxy.h
wait.h
watchdog.h
wimax.h
wireguard.h wireguard: global: fix spelling mistakes in comments 2019-12-16 19:22:22 -08:00
wireless.h
wmi.h
x25.h
xattr.h
xdp_diag.h
xfrm.h
xilinx-v4l2-controls.h
zorro.h
zorro_ids.h