Centos-kernel-stream-9/kernel/bpf/map_in_map.c

151 lines
4.0 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
/* Copyright (c) 2017 Facebook
*/
#include <linux/slab.h>
#include <linux/bpf.h>
#include <linux/btf.h>
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
#include "map_in_map.h"
struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
{
struct bpf_map *inner_map, *inner_map_meta;
bpf: fix inner map masking to prevent oob under speculation During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking for 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+11 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite 23: (67) r0 <<= 3 | map->unpriv_array set. 24: (0f) r0 += r1 | 25: (05) goto pc+1 | 26: (b7) r0 = 0 | 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking due to 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+12 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | 22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking 24: (67) r0 <<= 3 | for map->unpriv_array. 25: (0f) r0 += r1 | 26: (05) goto pc+1 | 27: (b7) r0 = 0 | 28: (b7) r0 = 0 29: (95) exit Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:34:45 +00:00
u32 inner_map_meta_size;
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
struct fd f;
int ret;
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
f = fdget(inner_map_ufd);
inner_map = __bpf_map_get(f);
if (IS_ERR(inner_map))
return inner_map;
/* Does not support >1 level map-in-map */
if (inner_map->inner_map_meta) {
ret = -EINVAL;
goto put;
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
}
bpf: Add map_meta_equal map ops Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 01:18:06 +00:00
if (!inner_map->ops->map_meta_equal) {
ret = -ENOTSUPP;
goto put;
bpf: Add map_meta_equal map ops Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 01:18:06 +00:00
}
bpf: fix inner map masking to prevent oob under speculation During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking for 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+11 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite 23: (67) r0 <<= 3 | map->unpriv_array set. 24: (0f) r0 += r1 | 25: (05) goto pc+1 | 26: (b7) r0 = 0 | 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking due to 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+12 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | 22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking 24: (67) r0 <<= 3 | for map->unpriv_array. 25: (0f) r0 += r1 | 26: (05) goto pc+1 | 27: (b7) r0 = 0 | 28: (b7) r0 = 0 29: (95) exit Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:34:45 +00:00
inner_map_meta_size = sizeof(*inner_map_meta);
/* In some cases verifier needs to access beyond just base map. */
if (inner_map->ops == &array_map_ops || inner_map->ops == &percpu_array_map_ops)
bpf: fix inner map masking to prevent oob under speculation During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking for 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+11 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite 23: (67) r0 <<= 3 | map->unpriv_array set. 24: (0f) r0 += r1 | 25: (05) goto pc+1 | 26: (b7) r0 = 0 | 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking due to 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+12 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | 22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking 24: (67) r0 <<= 3 | for map->unpriv_array. 25: (0f) r0 += r1 | 26: (05) goto pc+1 | 27: (b7) r0 = 0 | 28: (b7) r0 = 0 29: (95) exit Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:34:45 +00:00
inner_map_meta_size = sizeof(struct bpf_array);
inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
if (!inner_map_meta) {
ret = -ENOMEM;
goto put;
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
}
inner_map_meta->map_type = inner_map->map_type;
inner_map_meta->key_size = inner_map->key_size;
inner_map_meta->value_size = inner_map->value_size;
inner_map_meta->map_flags = inner_map->map_flags;
inner_map_meta->max_entries = inner_map->max_entries;
bpf: Refactor kptr_off_tab into btf_record Bugzilla: https://bugzilla.redhat.com/2177177 Conflicts: - Context change from already backported commit 997849c4b969 ("bpf: Zeroing allocated object from slab in bpf memory allocator") - Minor changes from already backported commit 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") commit aa3496accc412b3d975e4ee5d06076d73394d8b5 Author: Kumar Kartikeya Dwivedi <memxor@gmail.com> Date: Fri Nov 4 00:39:55 2022 +0530 bpf: Refactor kptr_off_tab into btf_record To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-03-15 09:14:48 +00:00
inner_map_meta->record = btf_record_dup(inner_map->record);
if (IS_ERR(inner_map_meta->record)) {
/* btf_record_dup returns NULL or valid pointer in case of
* invalid/empty/valid, but ERR_PTR in case of errors. During
* equality NULL or IS_ERR is equivalent.
*/
ret = PTR_ERR(inner_map_meta->record);
goto free;
}
bpf: Add comments for map BTF matching requirement for bpf_list_head Bugzilla: https://bugzilla.redhat.com/2177177 commit c22dfdd21592c5d56b49d5fba8de300ad7bf293c Author: Kumar Kartikeya Dwivedi <memxor@gmail.com> Date: Fri Nov 18 07:26:08 2022 +0530 bpf: Add comments for map BTF matching requirement for bpf_list_head The old behavior of bpf_map_meta_equal was that it compared timer_off to be equal (but not spin_lock_off, because that was not allowed), and did memcmp of kptr_off_tab. Now, we memcmp the btf_record of two bpf_map structs, which has all fields. We preserve backwards compat as we kzalloc the array, so if only spin lock and timer exist in map, we only compare offset while the rest of unused members in the btf_field struct are zeroed out. In case of kptr, btf and everything else is of vmlinux or module, so as long type is same it will match, since kernel btf, module, dtor pointer will be same across maps. Now with list_head in the mix, things are a bit complicated. We implicitly add a requirement that both BTFs are same, because struct btf_field_list_head has btf and value_rec members. We obviously shouldn't force BTFs to be equal by default, as that breaks backwards compatibility. Currently it is only implicitly required due to list_head matching struct btf and value_rec member. value_rec points back into a btf_record stashed in the map BTF (btf member of btf_field_list_head). So that pointer and btf member has to match exactly. Document all these subtle details so that things don't break in the future when touching this code. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-19-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-03-20 16:32:47 +00:00
/* Note: We must use the same BTF, as we also used btf_record_dup above
* which relies on BTF being same for both maps, as some members like
* record->fields.list_head have pointers like value_rec pointing into
* inner_map->btf.
*/
if (inner_map->btf) {
btf_get(inner_map->btf);
inner_map_meta->btf = inner_map->btf;
}
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
bpf: fix inner map masking to prevent oob under speculation During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking for 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+11 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite 23: (67) r0 <<= 3 | map->unpriv_array set. 24: (0f) r0 += r1 | 25: (05) goto pc+1 | 26: (b7) r0 = 0 | 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking due to 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+12 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | 22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking 24: (67) r0 <<= 3 | for map->unpriv_array. 25: (0f) r0 += r1 | 26: (05) goto pc+1 | 27: (b7) r0 = 0 | 28: (b7) r0 = 0 29: (95) exit Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:34:45 +00:00
/* Misc members not needed in bpf_map_meta_equal() check. */
inner_map_meta->ops = inner_map->ops;
if (inner_map->ops == &array_map_ops || inner_map->ops == &percpu_array_map_ops) {
struct bpf_array *inner_array_meta =
container_of(inner_map_meta, struct bpf_array, map);
struct bpf_array *inner_array = container_of(inner_map, struct bpf_array, map);
inner_array_meta->index_mask = inner_array->index_mask;
inner_array_meta->elem_size = inner_array->elem_size;
inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
bpf: fix inner map masking to prevent oob under speculation During review I noticed that inner meta map setup for map in map is buggy in that it does not propagate all needed data from the reference map which the verifier is later accessing. In particular one such case is index masking to prevent out of bounds access under speculative execution due to missing the map's unpriv_array/index_mask field propagation. Fix this such that the verifier is generating the correct code for inlined lookups in case of unpriviledged use. Before patch (test_verifier's 'map in map access' dump): # bpftool prog dump xla id 3 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:4] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking for 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+11 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!) 22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite 23: (67) r0 <<= 3 | map->unpriv_array set. 24: (0f) r0 += r1 | 25: (05) goto pc+1 | 26: (b7) r0 = 0 | 27: (b7) r0 = 0 28: (95) exit After patch: # bpftool prog dump xla id 1 0: (62) *(u32 *)(r10 -4) = 0 1: (bf) r2 = r10 2: (07) r2 += -4 3: (18) r1 = map[id:2] 5: (07) r1 += 272 | 6: (61) r0 = *(u32 *)(r2 +0) | 7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup 8: (54) (u32) r0 &= (u32) 0 | with index masking due to 9: (67) r0 <<= 3 | map->unpriv_array. 10: (0f) r0 += r1 | 11: (79) r0 = *(u64 *)(r0 +0) | 12: (15) if r0 == 0x0 goto pc+1 | 13: (05) goto pc+1 | 14: (b7) r0 = 0 | 15: (15) if r0 == 0x0 goto pc+12 16: (62) *(u32 *)(r10 -4) = 0 17: (bf) r2 = r10 18: (07) r2 += -4 19: (bf) r1 = r0 20: (07) r1 += 272 | 21: (61) r0 = *(u32 *)(r2 +0) | 22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map 23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking 24: (67) r0 <<= 3 | for map->unpriv_array. 25: (0f) r0 += r1 | 26: (05) goto pc+1 | 27: (b7) r0 = 0 | 28: (b7) r0 = 0 29: (95) exit Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:34:45 +00:00
}
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
fdput(f);
return inner_map_meta;
free:
kfree(inner_map_meta);
put:
fdput(f);
return ERR_PTR(ret);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
}
void bpf_map_meta_free(struct bpf_map *map_meta)
{
bpf: Refactor kptr_off_tab into btf_record Bugzilla: https://bugzilla.redhat.com/2177177 Conflicts: - Context change from already backported commit 997849c4b969 ("bpf: Zeroing allocated object from slab in bpf memory allocator") - Minor changes from already backported commit 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") commit aa3496accc412b3d975e4ee5d06076d73394d8b5 Author: Kumar Kartikeya Dwivedi <memxor@gmail.com> Date: Fri Nov 4 00:39:55 2022 +0530 bpf: Refactor kptr_off_tab into btf_record To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-03-15 09:14:48 +00:00
bpf_map_free_record(map_meta);
btf_put(map_meta->btf);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
kfree(map_meta);
}
bool bpf_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1)
{
/* No need to compare ops because it is covered by map_type */
return meta0->map_type == meta1->map_type &&
meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size &&
bpf: Allow storing unreferenced kptr in map Bugzilla: https://bugzilla.redhat.com/2120968 commit 61df10c7799e27807ad5e459eec9d77cddf8bf45 Author: Kumar Kartikeya Dwivedi <memxor@gmail.com> Date: Mon Apr 25 03:18:49 2022 +0530 bpf: Allow storing unreferenced kptr in map This commit introduces a new pointer type 'kptr' which can be embedded in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID register must have the same type as in the map value's BTF, and loading a kptr marks the destination register as PTR_TO_BTF_ID with the correct kernel BTF and BTF ID. Such kptr are unreferenced, i.e. by the time another invocation of the BPF program loads this pointer, the object which the pointer points to may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are patched to PROBE_MEM loads by the verifier, it would safe to allow user to still access such invalid pointer, but passing such pointers into BPF helpers and kfuncs should not be permitted. A future patch in this series will close this gap. The flexibility offered by allowing programs to dereference such invalid pointers while being safe at runtime frees the verifier from doing complex lifetime tracking. As long as the user may ensure that the object remains valid, it can ensure data read by it from the kernel object is valid. The user indicates that a certain pointer must be treated as kptr capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this information is recorded in the object BTF which will be passed into the kernel by way of map's BTF information. The name and kind from the map value BTF is used to look up the in-kernel type, and the actual BTF and BTF ID is recorded in the map struct in a new kptr_off_tab member. For now, only storing pointers to structs is permitted. An example of this specification is shown below: #define __kptr __attribute__((btf_type_tag("kptr"))) struct map_value { ... struct task_struct __kptr *task; ... }; Then, in a BPF program, user may store PTR_TO_BTF_ID with the type task_struct into the map, and then load it later. Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as the verifier cannot know whether the value is NULL or not statically, it must treat all potential loads at that map value offset as loading a possibly NULL pointer. Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL) are allowed instructions that can access such a pointer. On BPF_LDX, the destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX, it is checked whether the source register type is a PTR_TO_BTF_ID with same BTF type as specified in the map BTF. The access size must always be BPF_DW. For the map in map support, the kptr_off_tab for outer map is copied from the inner map's kptr_off_tab. It was chosen to do a deep copy instead of introducing a refcount to kptr_off_tab, because the copy only needs to be done when paramterizing using inner_map_fd in the map in map case, hence would be unnecessary for all other users. It is not permitted to use MAP_FREEZE command and mmap for BPF map having kptrs, similar to the bpf_timer case. A kptr also requires that BPF program has both read and write access to the map (hence both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed). Note that check_map_access must be called from both check_helper_mem_access and for the BPF instructions, hence the kptr check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src and reuse it for this purpose. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
2022-10-06 10:42:33 +00:00
meta0->map_flags == meta1->map_flags &&
bpf: Refactor kptr_off_tab into btf_record Bugzilla: https://bugzilla.redhat.com/2177177 Conflicts: - Context change from already backported commit 997849c4b969 ("bpf: Zeroing allocated object from slab in bpf memory allocator") - Minor changes from already backported commit 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") commit aa3496accc412b3d975e4ee5d06076d73394d8b5 Author: Kumar Kartikeya Dwivedi <memxor@gmail.com> Date: Fri Nov 4 00:39:55 2022 +0530 bpf: Refactor kptr_off_tab into btf_record To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2023-03-15 09:14:48 +00:00
btf_record_equal(meta0->record, meta1->record);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
}
void *bpf_map_fd_get_ptr(struct bpf_map *map,
struct file *map_file /* not used */,
int ufd)
{
bpf: Add map_meta_equal map ops Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 01:18:06 +00:00
struct bpf_map *inner_map, *inner_map_meta;
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
struct fd f;
f = fdget(ufd);
inner_map = __bpf_map_get(f);
if (IS_ERR(inner_map))
return inner_map;
bpf: Add map_meta_equal map ops Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28 01:18:06 +00:00
inner_map_meta = map->inner_map_meta;
if (inner_map_meta->ops->map_meta_equal(inner_map_meta, inner_map))
bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails 92117d8443bc ("bpf: fix refcnt overflow") turned refcounting of bpf_map into potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit (32k). Due to using 32-bit counter, it's possible in practice to overflow refcounter and make it wrap around to 0, causing erroneous map free, while there are still references to it, causing use-after-free problems. But having a failing refcounting operations are problematic in some cases. One example is mmap() interface. After establishing initial memory-mapping, user is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily splitting it into multiple non-contiguous regions. All this happening without any control from the users of mmap subsystem. Rather mmap subsystem sends notifications to original creator of memory mapping through open/close callbacks, which are optionally specified during initial memory mapping creation. These callbacks are used to maintain accurate refcount for bpf_map (see next patch in this series). The problem is that open() callback is not supposed to fail, because memory-mapped resource is set up and properly referenced. This is posing a problem for using memory-mapping with BPF maps. One solution to this is to maintain separate refcount for just memory-mappings and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively. There are similar use cases in current work on tcp-bpf, necessitating extra counter as well. This seems like a rather unfortunate and ugly solution that doesn't scale well to various new use cases. Another approach to solve this is to use non-failing refcount_t type, which uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX, stays there. This utlimately causes memory leak, but prevents use after free. But given refcounting is not the most performance-critical operation with BPF maps (it's not used from running BPF program code), we can also just switch to 64-bit counter that can't overflow in practice, potentially disadvantaging 32-bit platforms a tiny bit. This simplifies semantics and allows above described scenarios to not worry about failing refcount increment operation. In terms of struct bpf_map size, we are still good and use the same amount of space: BEFORE (3 cache lines, 8 bytes of padding at the end): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ int spin_lock_off; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ u32 btf_key_type_id; /* 56 4 */ u32 btf_value_type_id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct btf * btf; /* 64 8 */ struct bpf_map_memory memory; /* 72 16 */ bool unpriv_array; /* 88 1 */ bool frozen; /* 89 1 */ /* XXX 38 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ atomic_t refcnt __attribute__((__aligned__(64))); /* 128 4 */ atomic_t usercnt; /* 132 4 */ struct work_struct work; /* 136 32 */ char name[16]; /* 168 16 */ /* size: 192, cachelines: 3, members: 21 */ /* sum members: 146, holes: 1, sum holes: 38 */ /* padding: 8 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); AFTER (same 3 cache lines, no extra padding now): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ int spin_lock_off; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ u32 btf_key_type_id; /* 56 4 */ u32 btf_value_type_id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct btf * btf; /* 64 8 */ struct bpf_map_memory memory; /* 72 16 */ bool unpriv_array; /* 88 1 */ bool frozen; /* 89 1 */ /* XXX 38 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ atomic64_t refcnt __attribute__((__aligned__(64))); /* 128 8 */ atomic64_t usercnt; /* 136 8 */ struct work_struct work; /* 144 32 */ char name[16]; /* 176 16 */ /* size: 192, cachelines: 3, members: 21 */ /* sum members: 154, holes: 1, sum holes: 38 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); This patch, while modifying all users of bpf_map_inc, also cleans up its interface to match bpf_map_put with separate operations for bpf_map_inc and bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref, respectively). Also, given there are no users of bpf_map_inc_not_zero specifying uref=true, remove uref flag and default to uref=false internally. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com
2019-11-17 17:28:02 +00:00
bpf_map_inc(inner_map);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
else
inner_map = ERR_PTR(-EINVAL);
fdput(f);
return inner_map;
}
bpf: Add map and need_defer parameters to .map_fd_put_ptr() JIRA: https://issues.redhat.com/browse/RHEL-23644 commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 Author: Hou Tao <houtao1@huawei.com> Date: Mon Dec 4 22:04:20 2023 +0800 bpf: Add map and need_defer parameters to .map_fd_put_ptr() map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-04-15 10:59:57 +00:00
void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
{
bpf: Defer the free of inner map when necessary JIRA: https://issues.redhat.com/browse/RHEL-23644 commit 876673364161da50eed6b472d746ef88242b2368 Author: Hou Tao <houtao1@huawei.com> Date: Mon Dec 4 22:04:22 2023 +0800 bpf: Defer the free of inner map when necessary When updating or deleting an inner map in map array or map htab, the map may still be accessed by non-sleepable program or sleepable program. However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map directly through bpf_map_put(), if the ref-counter is the last one (which is true for most cases), the inner map will be freed by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so after the invocation of ops->map_free completes, the bpf program which is accessing the inner map may incur use-after-free problem. Fix the free of inner map by invoking bpf_map_free_deferred() after both one RCU grace period and one tasks trace RCU grace period if the inner map has been removed from the outer map before. The deferment is accomplished by using call_rcu() or call_rcu_tasks_trace() when releasing the last ref-counter of bpf map. The newly-added rcu_head field in bpf_map shares the same storage space with work field to reduce the size of bpf_map. Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-04-15 10:59:57 +00:00
struct bpf_map *inner_map = ptr;
/* Defer the freeing of inner map according to the sleepable attribute
* of bpf program which owns the outer map, so unnecessary waiting for
* RCU tasks trace grace period can be avoided.
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
*/
if (need_defer) {
if (atomic64_read(&map->sleepable_refcnt))
WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true);
else
WRITE_ONCE(inner_map->free_after_rcu_gp, true);
}
bpf: Defer the free of inner map when necessary JIRA: https://issues.redhat.com/browse/RHEL-23644 commit 876673364161da50eed6b472d746ef88242b2368 Author: Hou Tao <houtao1@huawei.com> Date: Mon Dec 4 22:04:22 2023 +0800 bpf: Defer the free of inner map when necessary When updating or deleting an inner map in map array or map htab, the map may still be accessed by non-sleepable program or sleepable program. However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map directly through bpf_map_put(), if the ref-counter is the last one (which is true for most cases), the inner map will be freed by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so after the invocation of ops->map_free completes, the bpf program which is accessing the inner map may incur use-after-free problem. Fix the free of inner map by invoking bpf_map_free_deferred() after both one RCU grace period and one tasks trace RCU grace period if the inner map has been removed from the outer map before. The deferment is accomplished by using call_rcu() or call_rcu_tasks_trace() when releasing the last ref-counter of bpf map. The newly-added rcu_head field in bpf_map shares the same storage space with work field to reduce the size of bpf_map. Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com>
2024-04-15 10:59:57 +00:00
bpf_map_put(inner_map);
bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 17:00:33 +00:00
}
u32 bpf_map_fd_sys_lookup_elem(void *ptr)
{
return ((struct bpf_map *)ptr)->id;
}