Ubuntu-focal-kernel/kernel
Song Liu af8455e330 bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()
BugLink: https://bugs.launchpad.net/bugs/1858428

[ Upstream commit eac9153f2b ]

bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
 try_to_wake_up+0x1ad/0x590
 wake_up_q+0x54/0x80
 rwsem_wake+0x8a/0xb0
 bpf_get_stack+0x13c/0x150
 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
 bpf_overflow_handler+0x60/0x100
 __perf_event_overflow+0x4f/0xf0
 perf_swevent_overflow+0x99/0xc0
 ___perf_sw_event+0xe7/0x120
 __schedule+0x47d/0x620
 schedule+0x29/0x90
 futex_wait_queue_me+0xb9/0x110
 futex_wait+0x139/0x230
 do_futex+0x2ac/0xa50
 __x64_sys_futex+0x13c/0x180
 do_syscall_64+0x42/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can be reproduced by:
1. Start a multi-thread program that does parallel mmap() and malloc();
2. taskset the program to 2 CPUs;
3. Attach bpf program to trace_sched_switch and gather stackmap with
   build-id, e.g. with trace.py from bcc tools:
   trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch

A sample reproducer is attached at the end.

This could also trigger deadlock with other locks that are nested with
rq_lock.

Fix this by checking whether irqs are disabled. Since rq_lock and all
other nested locks are irq safe, it is safe to do up_read() when irqs are
not disable. If the irqs are disabled, postpone up_read() in irq_work.

Fixes: 615755a77b ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

Reproducer:
============================ 8< ============================

char *filename;

void *worker(void *p)
{
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
}

int main(int argc, char *argv[])
{
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
}
============================ 8< ============================

Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
2020-01-06 08:16:00 -06:00
..
bpf bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack() 2020-01-06 08:16:00 -06:00
cgroup cgroup: pids: use atomic64_t for pids->limit 2020-01-06 07:33:23 -06:00
configs
debug UBUNTU: SAUCE: (lockdown) Add a SysRq option to lift kernel lockdown 2019-11-25 14:56:44 +01:00
dma dma-mapping: fix false positivse warnings in dma_common_free_remap() 2019-10-05 10:24:17 +02:00
events UBUNTU: SAUCE: security,perf: Allow further restriction of perf_event_open 2019-11-25 14:56:27 +01:00
gcov um: Enable CONFIG_CONSTRUCTORS 2019-09-15 21:37:13 +02:00
irq UBUNTU: SAUCE: allow IRQs to be irq-threaded by default via config 2019-11-25 14:56:25 +01:00
livepatch
locking UBUNTU: SAUCE: import aufs driver 2019-11-25 14:56:45 +01:00
power UBUNTU: SAUCE: PM / hibernate: memory_bm_find_bit -- tighten node optimisation 2019-11-25 14:56:58 +01:00
printk Merge branch 'for-5.4' into for-linus 2019-09-16 12:54:25 +02:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-16 17:25:49 -07:00
sched UBUNTU: SAUCE: binder: turn into module 2019-11-25 14:56:36 +01:00
time time: Zero the upper 32-bits in __kernel_timespec on 32-bit 2019-12-16 09:32:14 -06:00
trace tracing: Fix race in perf_trace_buf initialization 2019-10-21 19:38:28 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2019-09-27 19:37:27 -07:00
acct.c
async.c
audit.c
audit.h
audit_fsnotify.c
audit_tree.c
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-11-10 11:56:55 -05:00
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu.c cpu/speculation: Uninline and export CPU mitigations helpers 2019-11-04 12:22:02 +01:00
cpu_pm.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c futex: Mark the begin of futex exit explicitly 2019-12-05 16:29:58 -06:00
extable.c
fail_function.c
fork.c futex: Split futex_mm_release() for exit/exec 2019-12-05 16:29:58 -06:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c futex: Prevent exit livelock 2019-12-05 16:30:00 -06:00
gen_kheaders.sh kheaders: substituting --sort in archive creation 2019-10-17 09:08:19 +09:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
kcov.c
kexec.c
kexec_core.c kexec: bail out upon SIGKILL when allocating memory. 2019-09-25 17:51:40 -07:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
kexec_internal.h
kheaders.c
kmod.c
kprobes.c Tracing updates: 2019-09-20 11:19:48 -07:00
ksysfs.c
kthread.c UBUNTU: SAUCE: kthread: Do not leave kthread_create() immediately upon SIGKILL. 2019-11-25 14:56:25 +01:00
latencytop.c
module-internal.h
module.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
module_signature.c
module_signing.c UBUNTU: SAUCE: (lockdown) KEYS: Make use of platform keyring for module signature verify 2019-11-25 14:56:44 +01:00
notifier.c
nsproxy.c
padata.c padata: remove cpu_index from the parallel_queue 2019-09-13 21:15:41 +10:00
panic.c panic: ensure preemption is disabled during panic() 2019-10-07 15:47:19 -07:00
params.c
pid.c
pid_namespace.c
profile.c
ptrace.c
range.c
reboot.c
relay.c
resource.c mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range() 2019-09-24 15:54:09 -07:00
rseq.c
seccomp.c UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE 2019-11-25 14:56:58 +01:00
signal.c cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop() 2019-10-11 08:39:57 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c stacktrace: Don't skip first entry on noncurrent tasks 2019-11-04 21:19:25 +01:00
stop_machine.c stop_machine: Avoid potential race behaviour 2019-10-17 12:47:12 +02:00
sys.c UBUNTU: SAUCE: (no-up) add compat_uts_machine= kernel command line override 2019-11-25 14:56:24 +01:00
sys_ni.c
sysctl.c UBUNTU: SAUCE: add a sysctl to disable unprivileged user namespace unsharing 2019-11-25 14:56:26 +01:00
sysctl_binary.c
task_work.c UBUNTU: SAUCE: import aufs driver 2019-11-25 14:56:45 +01:00
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c
user_namespace.c UBUNTU: SAUCE: add a sysctl to disable unprivileged user namespace unsharing 2019-11-25 14:56:26 +01:00
utsname.c
utsname_sysctl.c
watchdog.c
watchdog_hld.c
workqueue.c workqueue: Fix missing kfree(rescuer) in destroy_workqueue() 2020-01-06 07:39:24 -06:00
workqueue_internal.h