sched/deadline: Fix bandwidth reclaim equation in GRUB

JIRA: https://issues.redhat.com/browse/RHEL-1536

commit 6a9d623aad89539eca71eb264db6b9d538620ad5
Author: Vineeth Pillai <vineeth@bitbyteword.org>
Date:   Tue May 30 09:55:25 2023 -0400

    sched/deadline: Fix bandwidth reclaim equation in GRUB

    According to the GRUB[1] rule, the runtime is depreciated as:
      "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

    To guarantee that deadline tasks doesn't starve lower class tasks,
    we do not allocate the full bandwidth of the cpu to deadline tasks.
    Maximum bandwidth usable by deadline tasks is denoted by "Umax".
    Considering Umax, equation (1) becomes:
      "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

    Current implementation has a minor bug in equation (2), which this
    patch fixes.

    The reclamation logic is verified by a sample program which creates
    multiple deadline threads and observing their utilization. The tests
    were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

    Tests on 6.3.0
    ==============

    RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
    TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
    TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

    RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
    TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
    TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

    RUN 3: 2 tasks
      Task 1: runtime=1ms, deadline=period=10ms
      Task 2: runtime=1ms, deadline=period=100ms
    TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
    TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
    TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
    TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

    As seen above, the reclamation doesn't reclaim the maximum allowed
    bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
    bandwidth also comes down.

    Tests with this patch applied
    =============================

    RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
    TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
    TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

    RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
    TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
    TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

    RUN 3: 2 tasks
      Task 1: runtime=1ms, deadline=period=10ms
      Task 2: runtime=1ms, deadline=period=100ms
    TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
    TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
    TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
    TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

    Running tasks on all cpus allowing for migration also showed that
    the utilization is reclaimed to the maximum. Running 10 tasks on
    3 cpus SCHED_FLAG_RECLAIM - top shows:
    %Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
    %Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
    %Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa

    [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
         (2015). Parallel and sequential reclaiming in multicore
         real-time global scheduling.

    Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Acked-by: Juri Lelli <juri.lelli@redhat.com>
    Link: https://lore.kernel.org/r/20230530135526.2385378-1-vineeth@bitbyteword.org

Signed-off-by: Phil Auld <pauld@redhat.com>
This commit is contained in:
Phil Auld 2023-09-05 11:45:17 -04:00
parent 2e85e05977
commit 09bb4bb5f4
2 changed files with 29 additions and 27 deletions

View File

@ -1255,43 +1255,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
} }
/* /*
* This function implements the GRUB accounting rule: * This function implements the GRUB accounting rule. According to the
* according to the GRUB reclaiming algorithm, the runtime is * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
* not decreased as "dq = -dt", but as * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
* "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
* where u is the utilization of the task, Umax is the maximum reclaimable * where u is the utilization of the task, Umax is the maximum reclaimable
* utilization, Uinact is the (per-runqueue) inactive utilization, computed * utilization, Uinact is the (per-runqueue) inactive utilization, computed
* as the difference between the "total runqueue utilization" and the * as the difference between the "total runqueue utilization" and the
* runqueue active utilization, and Uextra is the (per runqueue) extra * "runqueue active utilization", and Uextra is the (per runqueue) extra
* reclaimable utilization. * reclaimable utilization.
* Since rq->dl.running_bw and rq->dl.this_bw contain utilizations * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
* multiplied by 2^BW_SHIFT, the result has to be shifted right by * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
* BW_SHIFT. * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
* Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
* dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT. * Since delta is a 64 bit variable, to have an overflow its value should be
* Since delta is a 64 bit variable, to have an overflow its value * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
* should be larger than 2^(64 - 20 - 8), which is more than 64 seconds. * not an issue here.
* So, overflow is not an issue here.
*/ */
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se) static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{ {
u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
u64 u_act; u64 u_act;
u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT; u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
/* /*
* Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)}, * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
* we compare u_inact + rq->dl.extra_bw with * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
* 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because * can be larger than u_max. So, u_max - u_inact - u_extra would be
* u_inact + rq->dl.extra_bw can be larger than * negative leading to wrong results.
* 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
* leading to wrong results)
*/ */
if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min) if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
u_act = u_act_min; u_act = dl_se->dl_bw;
else else
u_act = BW_UNIT - u_inact - rq->dl.extra_bw; u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
return (delta * u_act) >> BW_SHIFT; return (delta * u_act) >> BW_SHIFT;
} }
@ -2802,12 +2798,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
{ {
if (global_rt_runtime() == RUNTIME_INF) { if (global_rt_runtime() == RUNTIME_INF) {
dl_rq->bw_ratio = 1 << RATIO_SHIFT; dl_rq->bw_ratio = 1 << RATIO_SHIFT;
dl_rq->extra_bw = 1 << BW_SHIFT; dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
} else { } else {
dl_rq->bw_ratio = to_ratio(global_rt_runtime(), dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT); global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
dl_rq->extra_bw = to_ratio(global_rt_period(), dl_rq->max_bw = dl_rq->extra_bw =
global_rt_runtime()); to_ratio(global_rt_period(), global_rt_runtime());
} }
} }

View File

@ -766,6 +766,12 @@ struct dl_rq {
u64 this_bw; u64 this_bw;
u64 extra_bw; u64 extra_bw;
/*
* Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
* tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
*/
u64 max_bw;
/* /*
* Inverse of the fraction of CPU utilization that can be reclaimed * Inverse of the fraction of CPU utilization that can be reclaimed
* by the GRUB algorithm. * by the GRUB algorithm.