io_uring/poll: fix poll_refs race with cancelation

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit 12ad3d2d6c5b0131a6052de91360849e3e154846
Author: Lin Ma <linma@zju.edu.cn>
Date:   Fri Nov 25 07:15:54 2022 -0700

    io_uring/poll: fix poll_refs race with cancelation
    
    There is an interesting race condition of poll_refs which could result
    in a NULL pointer dereference. The crash trace is like:
    
    KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
    CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
    1.13.0-1ubuntu1.1 04/01/2014
    RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
    RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
    Code: ...
    RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
    RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
    RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
    R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
    R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
    FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
     handle_tw_list io_uring/io_uring.c:1037 [inline]
     tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
     task_work_run+0x13a/0x1b0 kernel/task_work.c:177
     get_signal+0x2402/0x25a0 kernel/signal.c:2635
     arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
     exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
     exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
     __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
     syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    The root cause for this is a tiny overlooking in
    io_poll_check_events() when cocurrently run with poll cancel routine
    io_poll_cancel_req().
    
    The interleaving to trigger use-after-free:
    
    CPU0                                       |  CPU1
                                               |
    io_apoll_task_func()                       |  io_poll_cancel_req()
     io_poll_check_events()                    |
      // do while first loop                   |
      v = atomic_read(...)                     |
      // v = poll_refs = 1                     |
      ...                                      |  io_poll_mark_cancelled()
                                               |   atomic_or()
                                               |   // poll_refs =
    IO_POLL_CANCEL_FLAG | 1
                                               |
      atomic_sub_return(...)                   |
      // poll_refs = IO_POLL_CANCEL_FLAG       |
      // loop continue                         |
                                               |
                                               |  io_poll_execute()
                                               |   io_poll_get_ownership()
                                               |   // poll_refs =
    IO_POLL_CANCEL_FLAG | 1
                                               |   // gets the ownership
      v = atomic_read(...)                     |
      // poll_refs not change                  |
                                               |
      if (v & IO_POLL_CANCEL_FLAG)             |
       return -ECANCELED;                      |
      // io_poll_check_events return           |
      // will go into                          |
      // io_req_complete_failed() free req     |
                                               |
                                               |  io_apoll_task_func()
                                               |  // also go into
    io_req_complete_failed()
    
    And the interleaving to trigger the kernel WARNING:
    
    CPU0                                       |  CPU1
                                               |
    io_apoll_task_func()                       |  io_poll_cancel_req()
     io_poll_check_events()                    |
      // do while first loop                   |
      v = atomic_read(...)                     |
      // v = poll_refs = 1                     |
      ...                                      |  io_poll_mark_cancelled()
                                               |   atomic_or()
                                               |   // poll_refs =
    IO_POLL_CANCEL_FLAG | 1
                                               |
      atomic_sub_return(...)                   |
      // poll_refs = IO_POLL_CANCEL_FLAG       |
      // loop continue                         |
                                               |
      v = atomic_read(...)                     |
      // v = IO_POLL_CANCEL_FLAG               |
                                               |  io_poll_execute()
                                               |   io_poll_get_ownership()
                                               |   // poll_refs =
    IO_POLL_CANCEL_FLAG | 1
                                               |   // gets the ownership
                                               |
      WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
      // v & IO_POLL_REF_MASK = 0 WARN         |
                                               |
                                               |  io_apoll_task_func()
                                               |  // also go into
    io_req_complete_failed()
    
    By looking up the source code and communicating with Pavel, the
    implementation of this atomic poll refs should continue the loop of
    io_poll_check_events() just to avoid somewhere else to grab the
    ownership. Therefore, this patch simply adds another AND operation to
    make sure the loop will stop if it finds the poll_refs is exactly equal
    to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
    will finally make its way to io_req_complete_failed(), the req will
    be reclaimed as expected.
    
    Fixes: aa43477b0402 ("io_uring: poll rework")
    Signed-off-by: Lin Ma <linma@zju.edu.cn>
    Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
    [axboe: tweak description and code style]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
This commit is contained in:
Jeff Moyer 2022-11-25 07:15:54 -07:00
parent 7675a0e1e0
commit f8442c4d7e
1 changed files with 2 additions and 1 deletions

View File

@ -308,7 +308,8 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
* Release all references, retry if someone tried to restart
* task_work while we were executing it.
*/
} while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
} while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs) &
IO_POLL_REF_MASK);
return IOU_POLL_NO_ACTION;
}