Commit Graph

142 Commits

Author SHA1 Message Date
Čestmír Kalina 9361f76aa8 rtmutex: Drop rt_mutex::wait_lock before scheduling
JIRA: https://issues.redhat.com/browse/RHEL-60306

commit d33d26036a0274b472299d7dcdaa5fb34329f91b
Author: Roland Xu <mu001999@outlook.com>
Date: Thu, 15 Aug 2024 10:58:13 +0800

    rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held.  In the
    good case it returns with the lock held and in the deadlock case it emits a
    warning and goes into an endless scheduling loop with the lock held, which
    triggers the 'scheduling in atomic' warning.

    Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning
    and dropping into the schedule for ever loop.

    [ tglx: Moved unlock before the WARN(), removed the pointless comment,
      	massaged changelog, added Fixes tag ]

    Fixes: 3d5c9340d1 ("rtmutex: Handle deadlock detection smarter")
    Signed-off-by: Roland Xu <mu001999@outlook.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300MB0635.AUSP300.PROD.OUTLOOK.COM

Signed-off-by: Čestmír Kalina <ckalina@redhat.com>
2024-12-18 17:06:50 +01:00
Čestmír Kalina 4d2e958bff locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()
JIRA: https://issues.redhat.com/browse/RHEL-60306

commit ce3576ebd62d99f79c1dc98824e2ef6d6ab68434
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Wed, 24 Jan 2024 11:49:53 +0100

    Use try_cmpxchg() instead of cmpxchg(*ptr, old, new) == old.

    The x86 CMPXCHG instruction returns success in the ZF flag, so this change
    saves a compare after CMPXCHG (and related move instruction in front of CMPXCHG).

    Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG
    fails. There is no need to re-read the value in the loop.

    Note that the value from *ptr should be read using READ_ONCE() to prevent
    the compiler from merging, refetching or reordering the read.

    No functional change intended.

    Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Waiman Long <longman@redhat.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/r/20240124104953.612063-1-ubizjak@gmail.com

Signed-off-by: Čestmír Kalina <ckalina@redhat.com>
2024-12-16 22:02:24 +01:00
Waiman Long ca8db1144f locking/rtmutex: Add a lockdep assert to catch potential nested blocking
JIRA: https://issues.redhat.com/browse/RHEL-28616

commit 45f67f30a22f264bc7a0a61255c2ee1a838e9403
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri, 8 Sep 2023 18:22:53 +0200

    locking/rtmutex: Add a lockdep assert to catch potential nested blocking

    There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition
    functions, but that vanished in one of the rtmutex overhauls.

    Bring it back in form of a lockdep assert to catch code paths which take
    rtmutex based locks with current::pi_blocked_on != NULL.

    Reported-by: Crystal Wood <swood@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20230908162254.999499-7-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 10:06:01 -04:00
Waiman Long f62c68f20c locking/rtmutex: Use rt_mutex specific scheduler helpers
JIRA: https://issues.redhat.com/browse/RHEL-28616

commit d14f9e930b9073de264c106bf04968286ef9b3a4
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri, 8 Sep 2023 18:22:52 +0200

    locking/rtmutex: Use rt_mutex specific scheduler helpers

    Have rt_mutex use the rt_mutex specific scheduler helpers to avoid
    recursion vs rtlock on the PI state.

    [[ peterz: adapted to new names ]]

    Reported-by: Crystal Wood <swood@redhat.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 10:05:58 -04:00
Waiman Long c6a557ade6 locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES
JIRA: https://issues.redhat.com/browse/RHEL-28616

commit af9f006393b53409be0ca83ae234bef840cdef4a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri, 8 Sep 2023 18:22:49 +0200

    locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES

    With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire()
    always fails and all lock operations take the slow path.

    Provide a new helper inline rt_mutex_try_acquire() which maps to
    rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case
    it invokes rt_mutex_slowtrylock() which can acquire a non-contended
    rtmutex under full debug coverage.

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20230908162254.999499-3-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 10:05:57 -04:00
Waiman Long 0badc86620 Revert "locking/rtmutex: Submit/resume work explicitly before/after blocking"
JIRA: https://issues.redhat.com/browse/RHEL-28616
Upstream Status: RHEL only

Revert linux-rt-devel specific commit a44b38b17bf3 ("locking/rtmutex:
Submit/resume work explicitly before/after blocking") to prepare for
the submission of upstream equivalent.

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 09:56:35 -04:00
Waiman Long c07eb0516e Revert "locking/rtmutex: Avoid pointless blk_flush_plug() invocations"
JIRA: https://issues.redhat.com/browse/RHEL-28616
Upstream Status: RHEL only

Revert linux-rt-devel specific commit 96c0a06e80cb ("locking/rtmutex:
Avoid pointless blk_flush_plug() invocations") to prepare for the
submission of upstream equivalent.

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 09:56:35 -04:00
Waiman Long 3ad42081d5 Revert "locking/rtmutex: Add a lockdep assert to catch potential nested blocking"
JIRA: https://issues.redhat.com/browse/RHEL-28616
Upstream Status: RHEL only

Revert linux-rt-devel specific commit e2d27efe1923 ("locking/rtmutex:
Add a lockdep assert to catch potential nested blocking") to prepare
for the submission of upstream equivalent.

Signed-off-by: Waiman Long <longman@redhat.com>
2024-03-27 09:56:34 -04:00
Joel Savitz baca3f37f7 locking/rtmutex: Fix task->pi_waiters integrity
JIRA: https://issues.redhat.com/browse/RHEL-5226

commit f7853c34241807bb97673a5e97719123be39a09e
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jul 7 16:19:09 2023 +0200

    locking/rtmutex: Fix task->pi_waiters integrity

    Henry reported that rt_mutex_adjust_prio_check() has an ordering
    problem and puts the lie to the comment in [7]. Sharing the sort key
    between lock->waiters and owner->pi_waiters *does* create problems,
    since unlike what the comment claims, holding [L] is insufficient.

    Notably, consider:

            A
          /   \
         M1   M2
         |     |
         B     C

    That is, task A owns both M1 and M2, B and C block on them. In this
    case a concurrent chain walk (B & C) will modify their resp. sort keys
    in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
    is meaningless, they're different Ls.

    This then gives rise to a race condition between [7] and [11], where
    the requeue of pi_waiters will observe an inconsistent tree order.

            B                               C

      (holds M1->wait_lock,         (holds M2->wait_lock,
       holds B->pi_lock)             holds A->pi_lock)

      [7]
      waiter_update_prio();
      ...
      [8]
      raw_spin_unlock(B->pi_lock);
      ...
      [10]
      raw_spin_lock(A->pi_lock);

                                    [11]
                                    rt_mutex_enqueue_pi();
                                    // observes inconsistent A->pi_waiters
                                    // tree order

    Fixing this means either extending the range of the owner lock from
    [10-13] to [6-13], with the immediate problem that this means [6-8]
    hold both blocked and owner locks, or duplicating the sort key.

    Since the locking in chain walk is horrible enough without having to
    consider pi_lock nesting rules, duplicate the sort key instead.

    By giving each tree their own sort key, the above race becomes
    harmless, if C sees B at the old location, then B will correct things
    (if they need correcting) when it walks up the chain and reaches A.

    Fixes: fb00aca474 ("rtmutex: Turn the plist into an rb-tree")
    Reported-by: Henry Wu <triangletrap12@gmail.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Henry Wu <triangletrap12@gmail.com>
    Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net"

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
2024-01-15 10:10:43 -05:00
Crystal Wood 31f7062808 locking/rtmutex: Add a lockdep assert to catch potential nested blocking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218724

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit e2d27efe19234c94a42e123dc8122c4f13c9a9ab
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Apr 27 13:19:37 2023 +0200

    locking/rtmutex: Add a lockdep assert to catch potential nested blocking

    There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition
    functions, but that vanished in one of the rtmutex overhauls.

    Bring it back in form of a lockdep assert to catch code paths which take
    rtmutex based locks with current::pi_blocked_on != NULL.

    Reported-by: Crystal Wood <swood@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lore.kernel.org/r/20230427111937.2745231-5-bigeasy@linutronix.de

Signed-off-by: Crystal Wood <swood@redhat.com>
2023-07-18 17:22:36 -05:00
Crystal Wood fbe16f5d83 locking/rtmutex: Avoid pointless blk_flush_plug() invocations
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218724

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit 96c0a06e80cb53788a282e087773b2cfa5525545
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Apr 27 13:19:36 2023 +0200

    locking/rtmutex: Avoid pointless blk_flush_plug() invocations

    With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire()
    always fails and all lock operations take the slow path, which leads to the
    invocation of blk_flush_plug() even if the lock is not contended which is
    unnecessary and avoids batch processing of requests.

    Provide a new helper inline rt_mutex_try_acquire() which maps to
    rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it
    invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex
    under full debug coverage.

    Replace the rt_mutex_cmpxchg_acquire() invocations in __rt_mutex_lock() and
    __ww_rt_mutex_lock() with the new helper function, which avoid the
    blk_flush_plug() for the non-contended case and preserves the debug
    mechanism.

    [ tglx: Created a new helper and massaged changelog ]

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lore.kernel.org/r/20230427111937.2745231-4-bigeasy@linutronix.de

Signed-off-by: Crystal Wood <swood@redhat.com>
2023-07-18 17:22:36 -05:00
Crystal Wood 2ef9c3d906 locking/rtmutex: Submit/resume work explicitly before/after blocking
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2218724

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit a44b38b17bf31d90509125a8d34c9ac8f0dcc886
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Apr 27 13:19:35 2023 +0200

    locking/rtmutex: Submit/resume work explicitly before/after blocking

    schedule() invokes sched_submit_work() before scheduling and
    sched_resume_work() afterwards to ensure that queued block requests are
    flushed and the (IO)worker machineries can instantiate new workers if
    required. This avoids deadlocks and starvation.

    With rt_mutexes this can lead to a subtle problem:

      When rtmutex blocks current::pi_blocked_on points to the rtmutex it
      blocks on. When one of the functions in sched_submit/resume_work() contends
      on a rtmutex based lock then that would corrupt current::pi_blocked_on.

    Let rtmutex and the RT lock variants which are based on it invoke
    sched_submit/resume_work() explicitly before and after the slowpath so
    it's guaranteed that current::pi_blocked_on cannot be corrupted by blocking
    on two locks.

    This does not apply to the PREEMPT_RT variants of spinlock_t and rwlock_t
    as their scheduling slowpath is separate and cannot invoke the work related
    functions due to potential deadlocks anyway.

    [ tglx: Make it explicit and symmetric. Massage changelog ]

    Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks specific functions")
    Reported-by: Crystal Wood <swood@redhat.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@redhat.com
    Link: https://lore.kernel.org/r/20230427111937.2745231-3-bigeasy@linutronix.de

Signed-off-by: Crystal Wood <swood@redhat.com>
2023-07-18 17:22:36 -05:00
Joel Savitz 8f01288457 rtmutex: Ensure that the top waiter is always woken up
commit db370a8b9f67ae5f17e3d5482493294467784504
Author: Wander Lairson Costa <wander@redhat.com>
Date:   Thu Feb 2 09:30:20 2023 -0300

    rtmutex: Ensure that the top waiter is always woken up

    Let L1 and L2 be two spinlocks.

    Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
    waiter of L2.

    Let T2 be the task holding L2.

    Let T3 be a task trying to acquire L1.

    The following events will lead to a state in which the wait queue of L2
    isn't empty, but no task actually holds the lock.

    T1                T2                                  T3
    ==                ==                                  ==

                                                          spin_lock(L1)
                                                          | raw_spin_lock(L1->wait_lock)
                                                          | rtlock_slowlock_locked(L1)
                                                          | | task_blocks_on_rt_mutex(L1, T3)
                                                          | | | orig_waiter->lock = L1
                                                          | | | orig_waiter->task = T3
                                                          | | | raw_spin_unlock(L1->wait_lock)
                                                          | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
                      spin_unlock(L2)                     | | | |
                      | rt_mutex_slowunlock(L2)           | | | |
                      | | raw_spin_lock(L2->wait_lock)    | | | |
                      | | wakeup(T1)                      | | | |
                      | | raw_spin_unlock(L2->wait_lock)  | | | |
                                                          | | | | waiter = T1->pi_blocked_on
                                                          | | | | waiter == rt_mutex_top_waiter(L2)
                                                          | | | | waiter->task == T1
                                                          | | | | raw_spin_lock(L2->wait_lock)
                                                          | | | | dequeue(L2, waiter)
                                                          | | | | update_prio(waiter, T1)
                                                          | | | | enqueue(L2, waiter)
                                                          | | | | waiter != rt_mutex_top_waiter(L2)
                                                          | | | | L2->owner == NULL
                                                          | | | | wakeup(T1)
                                                          | | | | raw_spin_unlock(L2->wait_lock)
    T1 wakes up
    T1 != top_waiter(L2)
    schedule_rtlock()

    If the deadline of T1 is updated before the call to update_prio(), and the
    new deadline is greater than the deadline of the second top waiter, then
    after the requeue, T1 is no longer the top waiter, and the wrong task is
    woken up which will then go back to sleep because it is not the top waiter.

    This can be reproduced in PREEMPT_RT with stress-ng:

    while true; do
        stress-ng --sched deadline --sched-period 1000000000 \
                --sched-runtime 800000000 --sched-deadline \
                1000000000 --mmapfork 23 -t 20
    done

    A similar issue was pointed out by Thomas versus the cases where the top
    waiter drops out early due to a signal or timeout, which is a general issue
    for all regular rtmutex use cases, e.g. futex.

    The problematic code is in rt_mutex_adjust_prio_chain():

            // Save the top waiter before dequeue/enqueue
            prerequeue_top_waiter = rt_mutex_top_waiter(lock);

            rt_mutex_dequeue(lock, waiter);
            waiter_update_prio(waiter, task);
            rt_mutex_enqueue(lock, waiter);

            // Lock has no owner?
            if (!rt_mutex_owner(lock)) {
                    // Top waiter changed
      ---->         if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
      ---->                 wake_up_state(waiter->task, waiter->wake_state);

    This only takes the case into account where @waiter is the new top waiter
    due to the requeue operation.

    But it fails to handle the case where @waiter is not longer the top
    waiter due to the requeue operation.

    Ensure that the new top waiter is woken up so in all cases so it can take
    over the ownerless lock.

    [ tglx: Amend changelog, add Fixes tag ]

    Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
    Signed-off-by: Wander Lairson Costa <wander@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com
    Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2176147
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
2023-03-07 15:26:28 -05:00
Joel Savitz 2d216f7bd8 locking: Apply contention tracepoints in the slow path
conflict in kernel/locking/rtmutex.c
	detail: c9s commit c3a495f437 ("rtmutex: Add acquire semantics for rtmutex lock acquisition slow path"), backport of upstream commit 1c0908d8e441, adds a second parameter to fixup_rt_mutex_waiters(), which is not present in upstream commit ee042be16cb4.
	action: keep new call to fixup_rt_mutex_waiters()

commit ee042be16cb455116d0fe99b77c6bc8baf87c8c6
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Tue Mar 22 11:57:09 2022 -0700

    locking: Apply contention tracepoints in the slow path

    Adding the lock contention tracepoints in various lock function slow
    paths.  Note that each arch can define spinlock differently, I only
    added it only to the generic qspinlock for now.

    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Link: https://lkml.kernel.org/r/20220322185709.141236-3-namhyung@kernel.org

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2176147
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
2023-03-07 15:26:28 -05:00
Brian Masney c3a495f437 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c0908d8e441631f5b8ba433523cf39339ee2ba0
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2163507
Conflicts: Corrected minor context diff

commit 1c0908d8e441631f5b8ba433523cf39339ee2ba0
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Fri Dec 2 10:02:23 2022 +0000

    rtmutex: Add acquire semantics for rtmutex lock acquisition slow path

    Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
    on XFS on arm64.

     kernel BUG at fs/inode.c:625!
     Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
     CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
     pc : clear_inode+0xa0/0xc0
     lr : clear_inode+0x38/0xc0
     Call trace:
      clear_inode+0xa0/0xc0
      evict+0x160/0x180
      iput+0x154/0x240
      do_unlinkat+0x184/0x300
      __arm64_sys_unlinkat+0x48/0xc0
      el0_svc_common.constprop.4+0xe4/0x2c0
      do_el0_svc+0xac/0x100
      el0_svc+0x78/0x200
      el0t_64_sync_handler+0x9c/0xc0
      el0t_64_sync+0x19c/0x1a0

    It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
    is likely a bug that existed forever and only became visible when ARM
    support was added to preempt-rt. The same problem does not occur on x86-64
    and he also reported that converting sb->s_inode_wblist_lock to
    raw_spinlock_t makes the problem disappear indicating that the RT spinlock
    variant is the problem.

    Which in turn means that RT mutexes on ARM64 and any other weakly ordered
    architecture are affected by this independent of RT.

    Will Deacon observed:

      "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
       make sure that we have acquire semantics on all paths where the lock can
       be taken. Looking at the rtmutex code, this really isn't obvious to me
       -- for example, try_to_take_rt_mutex() appears to be able to return via
       the 'takeit' label without acquire semantics and it looks like we might
       be relying on the caller's subsequent _unlock_ of the wait_lock for
       ordering, but that will give us release semantics which aren't correct."

    Sebastian Andrzej Siewior prototyped a fix that does work based on that
    comment but it was a little bit overkill and added some fences that should
    not be necessary.

    The lock owner is updated with an IRQ-safe raw spinlock held, but the
    spin_unlock does not provide acquire semantics which are needed when
    acquiring a mutex.

    Adds the necessary acquire semantics for lock owner updates in the slow path
    acquisition and the waiter bit logic.

    It successfully completed 10 iterations of the dbench workload while the
    vanilla kernel fails on the first iteration.

    [ bigeasy@linutronix.de: Initial prototype fix ]

    Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
    Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
    Reported-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net

Signed-off-by: Brian Masney <bmasney@redhat.com>
2023-01-23 13:03:46 -05:00
Waiman Long 1f0d97425a locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713
Conflicts: Upstream merge conflict, use resolution listed in merge
	   commit f16cc980d649 ("Merge branch 'locking/urgent' into
	   locking/core").

commit 8f556a326c93213927e683fc32bbf5be1b62540a
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Fri, 17 Dec 2021 15:42:07 +0800

    locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()

    Optimistic spinning needs to be terminated when the spinning waiter is not
    longer the top waiter on the lock, but the condition is negated. It
    terminates if the waiter is the top waiter, which is defeating the whole
    purpose.

    Fixes: c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless")
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:34:04 -04:00
Waiman Long cf476291f3 locking: Make owner_on_cpu() into <linux/sched.h>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit c0bed69daf4b67809b58cc7cd81a8fa4f45bc161
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Fri, 3 Dec 2021 15:59:34 +0800

    locking: Make owner_on_cpu() into <linux/sched.h>

    Move the owner_on_cpu() from kernel/locking/rwsem.c into
    include/linux/sched.h with under CONFIG_SMP, then use it
    in the mutex/rwsem/rtmutex to simplify the code.

    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20211203075935.136808-2-wangkefeng.wang@huawei.com

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:34:03 -04:00
Waiman Long b16109588c locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 02ea9fc96fe976e7f7e067f38b12202f126e3f2f
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon, 29 Nov 2021 18:46:46 +0100

    locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.

    Similar to the issues in commits:

      6467822b8cc9 ("locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes")
      a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")

    ww_rt_mutex_lock() should not return EDEADLK without first going through
    the __ww_mutex logic to set the required state. In fact, the chain-walk
    can deal with the spurious cycles (per the above commits) this check
    warns about and is trying to avoid.

    Therefore ignore this test for ww_rt_mutex and simply let things fall
    in place.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20211129174654.668506-4-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:33:53 -04:00
Waiman Long 65d9183f94 rtmutex: Wake up the waiters lockless while dropping the read lock.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 9321f8152d9a764208c3f0dad49e0c55f293b7ab
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue, 28 Sep 2021 17:00:06 +0200

    rtmutex: Wake up the waiters lockless while dropping the read lock.

    The rw_semaphore and rwlock_t implementation both wake the waiter while
    holding the rt_mutex_base::wait_lock acquired.
    This can be optimized by waking the waiter lockless outside of the
    locked section to avoid a needless contention on the
    rt_mutex_base::wait_lock lock.

    Extend rt_mutex_wake_q_add() to also accept task and state and use it in
    __rwbase_read_unlock().

    Suggested-by: Davidlohr Bueso <dave@stgolabs.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20210928150006.597310-3-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:32:18 -04:00
Waiman Long c5f0f13946 rtmutex: Check explicit for TASK_RTLOCK_WAIT.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2076713

commit 8fe46535e10dbfebad68ad9f2f8260e49f5852c9
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue, 28 Sep 2021 17:00:05 +0200

    rtmutex: Check explicit for TASK_RTLOCK_WAIT.

    rt_mutex_wake_q_add() needs to  need to distiguish between sleeping
    locks (TASK_RTLOCK_WAIT) and normal locks which use TASK_NORMAL to use
    the proper wake mechanism.

    Instead of checking for != TASK_NORMAL make it more robust and check
    explicit for TASK_RTLOCK_WAIT which is the reason why a different wake
    mechanism is used.

    No functional change.

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20210928150006.597310-2-bigeasy@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2022-05-12 08:32:18 -04:00
Waiman Long e21ed8b6d0 locking/rtmutex: Fix ww_mutex deadlock check
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit e5480572706da1b2c2dc2c6484eab64f92b9263b
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed, 1 Sep 2021 11:44:11 +0200

    locking/rtmutex: Fix ww_mutex deadlock check

    Dan reported that rt_mutex_adjust_prio_chain() can be called with
    .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return
    success on deadlock for ww_mutex waiters") unconditionally dereferences it.

    Since both call-sites that have .orig_waiter == NULL don't care for the
    return value, simply disable the deadlock squash by adding the NULL check.

    Notably, both callers use the deadlock condition as a termination condition
    for the iteration; once detected, it is sure that (de)boosting is done.
    Arguably step [3] would be a more natural termination point, but it's
    dubious whether adding a third deadlock detection state would improve the
    code.

    Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lore.kernel.org/r/YS9La56fHMiCCo75@hirez.programming.kicks-ass.net

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:30 -04:00
Waiman Long beb2236d5b locking/rtmutex: Return success on deadlock for ww_mutex waiters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit a055fcc132d4c25b96d1115aea514258810dc6fc
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu, 26 Aug 2021 10:48:18 +0200

    locking/rtmutex: Return success on deadlock for ww_mutex waiters

    ww_mutexes can legitimately cause a deadlock situation in the lock graph
    which is resolved afterwards by the wait/wound mechanics. The rtmutex chain
    walk can detect such a deadlock and returns EDEADLK which in turn skips the
    wait/wound mechanism and returns EDEADLK to the caller. That's wrong
    because both lock chains might get EDEADLK or the wrong waiter would back
    out.

    Detect that situation and return 'success' in case that the waiter which
    initiated the chain walk is a ww_mutex with context. This allows the
    wait/wound mechanics to resolve the situation according to the rules.

    [ tglx: Split it apart and added changelog ]

    Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
    Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:29 -04:00
Waiman Long dd328dbe46 locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 6467822b8cc96e5feda98c7bf5c6329c6a896c91
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu, 26 Aug 2021 09:36:53 +0200

    locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes

    rtmutex based ww_mutexes can legitimately create a cycle in the lock graph
    which can be observed by a blocker which didn't cause the problem:

       P1: A, ww_A, ww_B
       P2: ww_B, ww_A
       P3: A

    P3 might therefore be trapped in the ww_mutex induced cycle and run into
    the lock depth limitation of rt_mutex_adjust_prio_chain() which returns
    -EDEADLK to the caller.

    Disable the deadlock detection walk when the chain walk observes a
    ww_mutex to prevent this looping.

    [ tglx: Split it apart and added changelog ]

    Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
    Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:28 -04:00
Waiman Long b9878279f9 locking/rtmutex: Dequeue waiter on ww_mutex deadlock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 37e8abff2bebbf9947d6b784f5c75ed48a717089
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed, 25 Aug 2021 12:33:14 +0200

    locking/rtmutex: Dequeue waiter on ww_mutex deadlock

    The rt_mutex based ww_mutex variant queues the new waiter first in the
    lock's rbtree before evaluating the ww_mutex specific conditions which
    might decide that the waiter should back out. This check and conditional
    exit happens before the waiter is enqueued into the PI chain.

    The failure handling at the call site assumes that the waiter, if it is the
    top most waiter on the lock, is queued in the PI chain and then proceeds to
    adjust the unmodified PI chain, which results in RB tree corruption.

    Dequeue the waiter from the lock waiter list in the ww_mutex error exit
    path to prevent this.

    Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex")
    Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:27 -04:00
Waiman Long c26102c71a locking/rtmutex: Dont dereference waiter lockless
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit c3123c431447da99db160264506de9897c003513
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed, 25 Aug 2021 12:33:12 +0200

    locking/rtmutex: Dont dereference waiter lockless

    The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is
    still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which
    is broken because that function contains a sanity check which dereferences
    the top waiter pointer to check whether the waiter belongs to the
    lock. That's wrong in the lockless spinwait case:

     CPU 0                                                  CPU 1
     rt_mutex_lock(lock)                                    rt_mutex_lock(lock);
       queue(waiter0)
       waiter0 == rt_mutex_top_waiter(lock)
       rt_mutex_spin_on_onwer(lock, waiter0) {              queue(waiter1)
                                                            waiter1 == rt_mutex_top_waiter(lock)
                                                            ...
         top_waiter = rt_mutex_top_waiter(lock)
           leftmost = rb_first_cached(&lock->waiters);
                                                            -> signal
                                                            dequeue(waiter1)
                                                            destroy(waiter1)
           w = rb_entry(leftmost, ....)
           BUG_ON(w->lock != lock)   <- UAF

    The BUG_ON() is correct for the case where the caller holds lock->wait_lock
    which guarantees that the leftmost waiter entry cannot vanish. For the
    lockless spinwait case it's broken.

    Create a new helper function which avoids the pointer dereference and just
    compares the leftmost entry pointer with current's waiter pointer to
    validate that currrent is still elegible for spinning.

    Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism")
    Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:27 -04:00
Waiman Long 8f0a29c215 locking/rtmutex: Add adaptive spinwait mechanism
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 992caf7f17243d736fc996770bac6566103778f6
Author: Steven Rostedt <rostedt@goodmis.org>
Date:   Sun, 15 Aug 2021 23:29:25 +0200

    locking/rtmutex: Add adaptive spinwait mechanism

    Going to sleep when locks are contended can be quite inefficient when the
    contention time is short and the lock owner is running on a different CPU.

    The MCS mechanism cannot be used because MCS is strictly FIFO ordered while
    for rtmutex based locks the waiter ordering is priority based.

    Provide a simple adaptive spinwait mechanism which currently restricts the
    spinning to the top priority waiter.

    [ tglx: Provide a contemporary changelog, extended it to all rtmutex based
            locks and updated it to match the other spin on owner implementations ]

    Originally-by: Gregory Haskins <ghaskins@novell.com>
    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211305.912050691@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:25 -04:00
Waiman Long 511fea4a2d locking/rtmutex: Implement equal priority lock stealing
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 48eb3f4fcfd35495a8357459aa6fe437aa430b00
Author: Gregory Haskins <ghaskins@novell.com>
Date:   Sun, 15 Aug 2021 23:29:23 +0200

    locking/rtmutex: Implement equal priority lock stealing

    The current logic only allows lock stealing to occur if the current task is
    of higher priority than the pending owner.

    Significant throughput improvements can be gained by allowing the lock
    stealing to include tasks of equal priority when the contended lock is a
    spin_lock or a rw_lock and the tasks are not in a RT scheduling task.

    The assumption was that the system will make faster progress by allowing
    the task already on the CPU to take the lock rather than waiting for the
    system to wake up a different task.

    This does add a degree of unfairness, but in reality no negative side
    effects have been observed in the many years that this has been used in the
    RT kernel.

    [ tglx: Refactored and rewritten several times by Steve Rostedt, Sebastian
            Siewior and myself ]

    Signed-off-by: Gregory Haskins <ghaskins@novell.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211305.857240222@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:24 -04:00
Waiman Long 83850b9f0b locking/rtmutex: Extend the rtmutex core to support ww_mutex
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit add461325ec5bc39aa619a1bfcde7245e5f31ac7
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sun, 15 Aug 2021 23:28:58 +0200

    locking/rtmutex: Extend the rtmutex core to support ww_mutex

    Add a ww acquire context pointer to the waiter and various functions and
    add the ww_mutex related invocations to the proper spots in the locking
    code, similar to the mutex based variant.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211304.966139174@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:12 -04:00
Waiman Long 21ef327373 locking/rtmutex: Squash !RT tasks to DEFAULT_PRIO
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 715f7f9ece4685157bb59560f6c612340d730ab4
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sun, 15 Aug 2021 23:28:30 +0200

    locking/rtmutex: Squash !RT tasks to DEFAULT_PRIO

    Ensure all !RT tasks have the same prio such that they end up in FIFO
    order and aren't split up according to nice level.

    The reason why nice levels were taken into account so far is historical. In
    the early days of the rtmutex code it was done to give the PI boosting and
    deboosting a larger coverage.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.938676930@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:19:00 -04:00
Waiman Long 3c4688ad75 locking/rtmutex: Provide the spin/rwlock core lock function
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 1c143c4b65da09081d644110e619decc49c9dee4
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:25 +0200

    locking/rtmutex: Provide the spin/rwlock core lock function

    A simplified version of the rtmutex slowlock function, which neither handles
    signals nor timeouts, and is careful about preserving the state of the
    blocked task across the lock operation.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.770228446@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:57 -04:00
Waiman Long 7f6abebfbf locking/rtmutex: Guard regular sleeping locks specific functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit e17ba59b7e8e1f67e36d8fcc46daa13370efcf11
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:12 +0200

    locking/rtmutex: Guard regular sleeping locks specific functions

    Guard the regular sleeping lock specific functionality, which is used for
    rtmutex on non-RT enabled kernels and for mutex, rtmutex and semaphores on
    RT enabled kernels so the code can be reused for the RT specific
    implementation of spinlocks and rwlocks in a different compilation unit.

    No functional change.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.311535693@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:51 -04:00
Waiman Long a98d615dfb locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 456cfbc65cd072f4f53936ee5a37eb1447a7d3ba
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:11 +0200

    locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locks

    Add an rtlock_task pointer to rt_mutex_wake_q, which allows to handle the RT
    specific wakeup for spin/rwlock waiters. The pointer is just consuming 4/8
    bytes on the stack so it is provided unconditionaly to avoid #ifdeffery all
    over the place.

    This cannot use a regular wake_q, because a task can have concurrent wakeups which
    would make it miss either lock or the regular wakeups, depending on what gets
    queued first, unless task struct gains a separate wake_q_node for this, which
    would be overkill, because there can only be a single task which gets woken
    up in the spin/rw_lock unlock path.

    No functional change for non-RT enabled kernels.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.253614678@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:51 -04:00
Waiman Long c0e45115f7 locking/rtmutex: Use rt_mutex_wake_q_head
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 7980aa397cc0968ea3ffee7a985c31c92ad84f81
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:09 +0200

    locking/rtmutex: Use rt_mutex_wake_q_head

    Prepare for the required state aware handling of waiter wakeups via wake_q
    and switch the rtmutex code over to the rtmutex specific wrapper.

    No functional change.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.197113263@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:50 -04:00
Waiman Long 58001d18d1 locking/rtmutex: Provide rt_wake_q_head and helpers
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit b576e640ce5e22673e12949cf14ae3cb18d9b859
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:08 +0200

    locking/rtmutex: Provide rt_wake_q_head and helpers

    To handle the difference between wakeups for regular sleeping locks (mutex,
    rtmutex, rw_semaphore) and the wakeups for 'sleeping' spin/rwlocks on
    PREEMPT_RT enabled kernels correctly, it is required to provide a
    wake_q_head construct which allows to keep them separate.

    Provide a wrapper around wake_q_head and the required helpers, which will be
    extended with the state handling later.

    No functional change.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.139337655@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:49 -04:00
Waiman Long 07c9369108 locking/rtmutex: Add wake_state to rt_mutex_waiter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit c014ef69b3acdb8c9e7fc412e96944f4d5c36fa0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:06 +0200

    locking/rtmutex: Add wake_state to rt_mutex_waiter

    Regular sleeping locks like mutexes, rtmutexes and rw_semaphores are always
    entering and leaving a blocking section with task state == TASK_RUNNING.

    On a non-RT kernel spinlocks and rwlocks never affect the task state, but
    on RT kernels these locks are converted to rtmutex based 'sleeping' locks.

    So in case of contention the task goes to block, which requires to carefully
    preserve the task state, and restore it after acquiring the lock taking
    regular wakeups for the task into account, which happened while the task was
    blocked. This state preserving is achieved by having a separate task state
    for blocking on a RT spin/rwlock and a saved_state field in task_struct
    along with careful handling of these wakeup scenarios in try_to_wake_up().

    To avoid conditionals in the rtmutex code, store the wake state which has
    to be used for waking a lock waiter in rt_mutex_waiter which allows to
    handle the regular and RT spin/rwlocks by handing it to wake_up_state().

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211303.079800739@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:49 -04:00
Waiman Long f90139f12c locking/rtmutex: Provide rt_mutex_slowlock_locked()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit ebbdc41e90ffce8b6bb3cbba1801ede2dd07a89b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:28:00 +0200

    locking/rtmutex: Provide rt_mutex_slowlock_locked()

    Split the inner workings of rt_mutex_slowlock() out into a separate
    function, which can be reused by the upcoming RT lock substitutions,
    e.g. for rw_semaphores.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.841971086@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:46 -04:00
Waiman Long 3c29e6cff1 locking/rtmutex: Split out the inner parts of 'struct rtmutex'
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 830e6acc8a1cafe153a0d88f9b2455965b396131
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sun, 15 Aug 2021 23:27:58 +0200

    locking/rtmutex: Split out the inner parts of 'struct rtmutex'

    RT builds substitutions for rwsem, mutex, spinlock and rwlock around
    rtmutexes. Split the inner working out so each lock substitution can use
    them with the appropriate lockdep annotations. This avoids having an extra
    unused lockdep map in the wrapped rtmutex.

    No functional change.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:45 -04:00
Waiman Long 41f37aa11c locking/rtmutex: Split API from implementation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 531ae4b06a737ed5539cd75dc6f6b9a28f900bba
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:27:57 +0200

    locking/rtmutex: Split API from implementation

    Prepare for reusing the inner functions of rtmutex for RT lock
    substitutions: introduce kernel/locking/rtmutex_api.c and move
    them there.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.726560996@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:44 -04:00
Waiman Long cee965ef5a locking/rtmutex: Switch to from cmpxchg_*() to try_cmpxchg_*()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 709e0b62869f625afd18edd79f190c38cb39dfb2
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:27:55 +0200

    locking/rtmutex: Switch to from cmpxchg_*() to try_cmpxchg_*()

    Allows the compiler to generate better code depending on the architecture.

    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.668958502@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:43 -04:00
Waiman Long b7b36f9743 locking/rtmutex: Convert macros to inlines
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit 785159301bedea25fae9b20cae3d12377246e941
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Sun, 15 Aug 2021 23:27:54 +0200

    locking/rtmutex: Convert macros to inlines

    Inlines are type-safe...

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.610830960@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:43 -04:00
Waiman Long 448f1816fe locking/rtmutex: Set proper wait context for lockdep
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2007032

commit b41cda03765580caf7723b8c1b672d191c71013f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun, 15 Aug 2021 23:27:38 +0200

    locking/rtmutex: Set proper wait context for lockdep

    RT mutexes belong to the LD_WAIT_SLEEP class. Make them so.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de

Signed-off-by: Waiman Long <longman@redhat.com>
2021-09-27 16:18:36 -04:00
Zhen Lei 07d25971b2 locking/rtmutex: Use the correct rtmutex debugging config option
It's CONFIG_DEBUG_RT_MUTEXES not CONFIG_DEBUG_RT_MUTEX.

Fixes: f7efc4799f ("locking/rtmutex: Inline chainwalk depth check")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210731123011.4555-1-thunder.leizhen@huawei.com
2021-08-10 08:21:52 +02:00
Peter Zijlstra 2f064a59a1 sched: Change task_struct::state
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
2021-06-18 11:43:09 +02:00
Thomas Gleixner a51a327f3b locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()
The signal handling in __rt_mutex_slowlock() is open coded.

Use signal_pending_state() instead.

Aside of the cleanup this also prepares for the RT lock substituions which
require support for TASK_KILLABLE.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153944.533811987@linutronix.de
2021-03-29 15:57:05 +02:00
Thomas Gleixner c2c360ed7f locking/rtmutex: Restrict the trylock WARN_ON() to debug
The warning as written is expensive and not really required for a
production kernel. Make it depend on rt mutex debugging and use !in_task()
for the condition which generates far better code and gives the same
answer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153944.436565064@linutronix.de
2021-03-29 15:57:04 +02:00
Thomas Gleixner 82cd5b1039 locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()
Preemption is disabled in mark_wakeup_next_waiter(,) not in
rt_mutex_slowunlock().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153944.341734608@linutronix.de
2021-03-29 15:57:04 +02:00
Thomas Gleixner 70c80103aa locking/rtmutex: Consolidate the fast/slowpath invocation
The indirection via a function pointer (which is at least optimized into a
tail call by the compiler) is making the code hard to read.

Clean it up and move the futex related trylock functions down to the futex
section.

Move the wake_q wakeup into rt_mutex_slowunlock(). No point in handing it
to the caller. The futex code uses a different function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153944.247927548@linutronix.de
2021-03-29 15:57:04 +02:00
Thomas Gleixner d7a2edb890 locking/rtmutex: Make text section and inlining consistent
rtmutex is half __sched and the other half is not. If the compiler decides
to not inline larger static functions then part of the code ends up in the
regular text section.

There are also quite some performance related small helpers which are
either static or plain inline. Force inline those which make sense and mark
the rest __sched.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153944.152977820@linutronix.de
2021-03-29 15:57:04 +02:00
Thomas Gleixner f5a98866e5 locking/rtmutex: Decrapify __rt_mutex_init()
The conditional debug handling is just another layer of obfuscation. Split
the function so rt_mutex_init_proxy_locked() can invoke the inner init and
__rt_mutex_init() gets the full treatment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153943.955697588@linutronix.de
2021-03-29 15:57:03 +02:00
Thomas Gleixner f7efc4799f locking/rtmutex: Inline chainwalk depth check
There is no point for this wrapper at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210326153943.754254046@linutronix.de
2021-03-29 15:57:03 +02:00