Commit Graph

305 Commits

Author SHA1 Message Date
Ivan Vecera 6ad962dd3e net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
JIRA: https://issues.redhat.com/browse/RHEL-57768

commit 34d35b4edbbe890a91bec939bfd29ad92517a52b
Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Oct 17 19:10:48 2024 +0300

    net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers

    tcf_action_init() has logic for checking mismatches between action and
    filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run
    on the transition between the new tc_act_bind(flags) returning true (aka
    now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning
    false (aka action was not bound to classifier before). Otherwise, the
    check is skipped.

    For the case where an action is not standalone, but rather it was
    created by a classifier and is bound to it, tcf_action_init() skips the
    check entirely, and this means it allows mismatched flags to occur.

    Taking the matchall classifier code path as an example (with mirred as
    an action), the reason is the following:

     1 | mall_change()
     2 | -> mall_replace_hw_filter()
     3 |   -> tcf_exts_validate_ex()
     4 |      -> flags |= TCA_ACT_FLAGS_BIND;
     5 |      -> tcf_action_init()
     6 |         -> tcf_action_init_1()
     7 |            -> a_o->init()
     8 |               -> tcf_mirred_init()
     9 |                  -> tcf_idr_create_from_flags()
    10 |                     -> tcf_idr_create()
    11 |                        -> p->tcfa_flags = flags;
    12 |         -> tc_act_bind(flags))
    13 |         -> tc_act_bind(act->tcfa_flags)

    When invoked from tcf_exts_validate_ex() like matchall does (but other
    classifiers validate their extensions as well), tcf_action_init() runs
    in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by
    line 4). So line 12 is always true, and line 13 is always true as well.
    No transition ever takes place, and the check is skipped.

    The code was added in this form in commit c86e0209dc77 ("flow_offload:
    validate flags of filter and actions"), but I'm attributing the blame
    even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and
    TCA_ACT_FLAGS_SKIP_SW were added to the UAPI.

    Following the development process of this change, the check did not
    always exist in this form. A change took place between v3 [1] and v4 [2],
    AFAIU due to review feedback that it doesn't make sense for action flags
    to be different than classifier flags. I think I agree with that
    feedback, but it was translated into code that omits enforcing this for
    "classic" actions created at the same time with the filters themselves.

    There are 3 more important cases to discuss. First there is this command:

    $ tc qdisc add dev eth0 clasct
    $ tc filter add dev eth0 ingress matchall skip_sw \
            action mirred ingress mirror dev eth1

    which should be allowed, because prior to the concept of dedicated
    action flags, it used to work and it used to mean the action inherited
    the skip_sw/skip_hw flags from the classifier. It's not a mismatch.

    Then we have this command:

    $ tc qdisc add dev eth0 clasct
    $ tc filter add dev eth0 ingress matchall skip_sw \
            action mirred ingress mirror dev eth1 skip_hw

    where there is a mismatch and it should be rejected.

    Finally, we have:

    $ tc qdisc add dev eth0 clasct
    $ tc filter add dev eth0 ingress matchall skip_sw \
            action mirred ingress mirror dev eth1 skip_sw

    where the offload flags coincide, and this should be treated the same as
    the first command based on inheritance, and accepted.

    [1]: https://lore.kernel.org/netdev/20211028110646.13791-9-simon.horman@corigine.com/
    [2]: https://lore.kernel.org/netdev/20211118130805.23897-10-simon.horman@corigine.com/
    Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Tested-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/20241017161049.3570037-1-vladimir.oltean@nxp.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-11-22 11:07:15 +01:00
Ivan Vecera 8db36f9b9d net/sched: Load modules via their alias
JIRA: https://issues.redhat.com/browse/RHEL-57767

commit 2c15a5aee2f32e341d1585fa1867eece76a1edb8
Author: Michal Koutný <mkoutny@suse.com>
Date:   Thu Feb 1 14:09:42 2024 +0100

    net/sched: Load modules via their alias

    The cls_,sch_,act_ modules may be loaded lazily during network
    configuration but without user's awareness and control.

    Switch the lazy loading from canonical module names to a module alias.
    This allows finer control over lazy loading, the precedent from
    commit 7f78e03513 ("fs: Limit sys_mount to only request filesystem
    modules.") explains it already:

            Using aliases means user space can control the policy of which
            filesystem^W net/sched modules are auto-loaded by editing
            /etc/modprobe.d/*.conf with blacklist and alias directives.
            Allowing simple, safe, well understood work-arounds to known
            problematic software.

    By default, nothing changes. However, if a specific module is
    blacklisted (its canonical name), it won't be modprobe'd when requested
    under its alias (i.e. kernel auto-loading). It would appear as if the
    given module was unknown.

    The module can still be loaded under its canonical name, which is an
    explicit (privileged) user action.

    Signed-off-by: Michal Koutný <mkoutny@suse.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20240201130943.19536-4-mkoutny@suse.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-09-06 15:01:42 +02:00
Davide Caratti 60d43fdcbe net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()
JIRA: https://issues.redhat.com/browse/RHEL-45534
Upstream Status: net.git commit d864319871b05fadd153e0aede4811ca7008f5d6

commit d864319871b05fadd153e0aede4811ca7008f5d6
Author: David Ruth <druth@chromium.org>
Date:   Fri Jun 14 19:03:26 2024 +0000

    net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()

    syzbot found hanging tasks waiting on rtnl_lock [1]

    A reproducer is available in the syzbot bug.

    When a request to add multiple actions with the same index is sent, the
    second request will block forever on the first request. This holds
    rtnl_lock, and causes tasks to hang.

    Return -EAGAIN to prevent infinite looping, while keeping documented
    behavior.

    [1]

    INFO: task kworker/1:0:5088 blocked for more than 143 seconds.
    Not tainted 6.9.0-rc4-syzkaller-00173-g3cdb45594619 #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/1:0 state:D stack:23744 pid:5088 tgid:5088 ppid:2 flags:0x00004000
    Workqueue: events_power_efficient reg_check_chans_work
    Call Trace:
    <TASK>
    context_switch kernel/sched/core.c:5409 [inline]
    __schedule+0xf15/0x5d00 kernel/sched/core.c:6746
    __schedule_loop kernel/sched/core.c:6823 [inline]
    schedule+0xe7/0x350 kernel/sched/core.c:6838
    schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6895
    __mutex_lock_common kernel/locking/mutex.c:684 [inline]
    __mutex_lock+0x5b8/0x9c0 kernel/locking/mutex.c:752
    wiphy_lock include/net/cfg80211.h:5953 [inline]
    reg_leave_invalid_chans net/wireless/reg.c:2466 [inline]
    reg_check_chans_work+0x10a/0x10e0 net/wireless/reg.c:2481

    Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
    Reported-by: syzbot+b87c222546179f4513a7@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=b87c222546179f4513a7
    Signed-off-by: David Ruth <druth@chromium.org>
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20240614190326.1349786-1-druth@chromium.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-06-28 16:08:15 +02:00
Ivan Vecera d24cd5c4a6 net/sched: simplify tc_action_load_ops parameters
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 405cd9fc6f44f7a54505019bea60de83f1c58365
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Thu Jan 4 21:38:10 2024 -0300

    net/sched: simplify tc_action_load_ops parameters

    Instead of using two bools derived from a flags passed as arguments to
    the parent function of tc_action_load_ops, just pass the flags itself
    to tc_action_load_ops to simplify its parameters.

    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:24 +02:00
Ivan Vecera efba56b7a8 net/sched: introduce ACT_P_BOUND return code
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit c2a67de9bb543394aee869d1c68b5fbcd8a89dcb
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 29 10:26:41 2023 -0300

    net/sched: introduce ACT_P_BOUND return code

    Bound actions always return '0' and as of today we rely on '0'
    being returned in order to properly skip bound actions in
    tcf_idr_insert_many. In order to further improve maintainability,
    introduce the ACT_P_BOUND return code.

    Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
    tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:24 +02:00
Ivan Vecera 25391b6030 net: sched: Add initial TC error skb drop reasons
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 4cf24dc8934074725042c0bd10b91f4d4b5269bb
Author: Victor Nogueira <victor@mojatatu.com>
Date:   Sat Dec 16 17:44:36 2023 -0300

    net: sched: Add initial TC error skb drop reasons

    Continue expanding Daniel's patch by adding new skb drop reasons that
    are idiosyncratic to TC.

    More specifically:

    - SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst
      processing a tc ext cookie.

    - SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed.

    - SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop
      iterations

    Signed-off-by: Victor Nogueira <victor@mojatatu.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:23 +02:00
Ivan Vecera 0c3f908699 net: sched: Move drop_reason to struct tc_skb_cb
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit fb2780721ca5e9f78bbe4544b819b929a982df9c
Author: Victor Nogueira <victor@mojatatu.com>
Date:   Sat Dec 16 17:44:34 2023 -0300

    net: sched: Move drop_reason to struct tc_skb_cb

    Move drop_reason from struct tcf_result to skb cb - more specifically to
    struct tc_skb_cb. With that, we'll be able to also set the drop reason for
    the remaining qdiscs (aside from clsact) that do not have access to
    tcf_result when time comes to set the skb drop reason.

    Signed-off-by: Victor Nogueira <victor@mojatatu.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:23 +02:00
Ivan Vecera 5dc793543c net/sched: act_api: skip idr replace on bound actions
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 1dd7f18fc0ed75dad4d5f2ecc84f69c6b62b6a81
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Mon Dec 11 15:18:07 2023 -0300

    net/sched: act_api: skip idr replace on bound actions

    tcf_idr_insert_many will replace the allocated -EBUSY pointer in
    tcf_idr_check_alloc with the real action pointer, exposing it
    to all operations. This operation is only needed when the action pointer
    is created (ACT_P_CREATED). For actions which are bound to (returned 0),
    the pointer already resides in the idr making such operation a nop.

    Even though it's a nop, it's still not a cheap operation as internally
    the idr code walks the idr and then does a replace on the appropriate slot.
    So if the action was bound, better skip the idr replace entirely.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
    Link: https://lore.kernel.org/r/20231211181807.96028-3-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:23 +02:00
Ivan Vecera 56ab03de6b net/sched: act_api: rely on rcu in tcf_idr_check_alloc
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 4b55e86736d5b492cf689125da2600f59c7d2c39
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Mon Dec 11 15:18:06 2023 -0300

    net/sched: act_api: rely on rcu in tcf_idr_check_alloc

    Instead of relying only on the idrinfo->lock mutex for
    bind/alloc logic, rely on a combination of rcu + mutex + atomics
    to better scale the case where multiple rtnl-less filters are
    binding to the same action object.

    Action binding happens when an action index is specified explicitly and
    an action exists which such index exists. Example:
      tc actions add action drop index 1
      tc filter add ... matchall action drop index 1
      tc filter add ... matchall action drop index 1
      tc filter add ... matchall action drop index 1
      tc filter ls ...
         filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
         not_in_hw
               action order 1: gact action drop
                random type none pass val 0
                index 1 ref 4 bind 3

       filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
         not_in_hw
               action order 1: gact action drop
                random type none pass val 0
                index 1 ref 4 bind 3

       filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
         not_in_hw
               action order 1: gact action drop
                random type none pass val 0
                index 1 ref 4 bind 3

    When no index is specified, as before, grab the mutex and allocate
    in the idr the next available id. In this version, as opposed to before,
    it's simplified to store the -EBUSY pointer instead of the previous
    alloc + replace combination.

    When an index is specified, rely on rcu to find if there's an object in
    such index. If there's none, fallback to the above, serializing on the
    mutex and reserving the specified id. If there's one, it can be an -EBUSY
    pointer, in which case we just try again until it's an action, or an action.
    Given the rcu guarantees, the action found could be dead and therefore
    we need to bump the refcount if it's not 0, handling the case it's
    in fact 0.

    As bind and the action refcount are already atomics, these increments can
    happen without the mutex protection while many tcf_idr_check_alloc race
    to bind to the same action instance.

    In case binding encounters a parallel delete or add, it will return
    -EAGAIN in order to try again. Both filter and action apis already
    have the retry machinery in-place. In case it's an unlocked filter it
    retries under the rtnl lock.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
    Link: https://lore.kernel.org/r/20231211181807.96028-2-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:23 +02:00
Ivan Vecera 3bb8d738f5 net/sched: act_api: conditional notification of events
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 8d4390f51920c1edb2d09d44d918c7940ac51e54
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 8 16:28:45 2023 -0300

    net/sched: act_api: conditional notification of events

    As of today tc-action events are unconditionally built and sent to
    RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
    before-hand if they are really needed.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera b65c1ccefc net/sched: act_api: don't open code max()
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit c73724bfde0932cb0cafff2855e8ce81e12fd594
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 8 16:28:44 2023 -0300

    net/sched: act_api: don't open code max()

    Use max() in a couple of places that are open coding it with the
    ternary operator.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera 765db829f7 net/sched: act_api: use tcf_act_for_each_action in tcf_idr_insert_many
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit f9bfc8eb1342c7ddbe1b7be9d1ebd5bc80fb72b0
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 1 14:50:15 2023 -0300

    net/sched: act_api: use tcf_act_for_each_action in tcf_idr_insert_many

    The actions array is contiguous, so stop processing whenever a NULL
    is found. This is already the assumption for tcf_action_destroy[1],
    which is called from tcf_actions_init.

    [1] https://elixir.bootlin.com/linux/v6.7-rc3/source/net/sched/act_api.c#L1115

    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera ae15a9f724 net/sched: act_api: stop loop over ops array on NULL in tcf_action_init
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit e09ac779f736e75eab501b77f2a4f13d245f0a6d
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 1 14:50:14 2023 -0300

    net/sched: act_api: stop loop over ops array on NULL in tcf_action_init

    The ops array is contiguous, so stop processing whenever a NULL is found

    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera c0fbd0d19b net/sched: act_api: avoid non-contiguous action array
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit a0e947c9ccffe47d45aca793d9e7fe4f4494e381
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 1 14:50:13 2023 -0300

    net/sched: act_api: avoid non-contiguous action array

    In tcf_action_add, when putting the reference for the bound actions
    it assigns NULLs to just created actions passing a non contiguous
    array to tcf_action_put_many.
    Refactor the code so the actions array is always contiguous.

    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera 5df5875bcf net/sched: act_api: use tcf_act_for_each_action
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 3872347e0a16876279bb21642e03842f283f0e38
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Dec 1 14:50:12 2023 -0300

    net/sched: act_api: use tcf_act_for_each_action

    Use the auxiliary macro tcf_act_for_each_action in all the
    functions that expect a contiguous action array

    Suggested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:22 +02:00
Ivan Vecera 9488742a2f net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config
JIRA: https://issues.redhat.com/browse/RHEL-36218

commit 40cb2fdfed342e7e578d551a073687789f698d89
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sat Oct 28 13:16:10 2023 -0400

    net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config

    Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
    reproducer [2]. Problem seems to be that classifiers clear 'struct
    tcf_result::drop_reason', thereby triggering the warning in
    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).

    Fixed by disambiguating a legit error from a verdict with a bogus drop_reason

    [1]
    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
    Modules linked in:
    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
    RIP: 0010:kfree_skb_reason+0x38/0x130
    [...]
    Call Trace:
     <IRQ>
     __netif_receive_skb_core.constprop.0+0x837/0xdb0
     __netif_receive_skb_one_core+0x3c/0x70
     process_backlog+0x95/0x130
     __napi_poll+0x25/0x1b0
     net_rx_action+0x29b/0x310
     __do_softirq+0xc0/0x29b
     do_softirq+0x43/0x60
     </IRQ>

    [2]

    ip link add name veth0 type veth peer name veth1
    ip link set dev veth0 up
    ip link set dev veth1 up
    tc qdisc add dev veth1 clsact
    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1

    Ido reported:

      [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
      reproducer [2]. Problem seems to be that classifiers clear 'struct
      tcf_result::drop_reason', thereby triggering the warning in
      __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]

      [1]
      WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
      Modules linked in:
      CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      RIP: 0010:kfree_skb_reason+0x38/0x130
      [...]
      Call Trace:
       <IRQ>
       __netif_receive_skb_core.constprop.0+0x837/0xdb0
       __netif_receive_skb_one_core+0x3c/0x70
       process_backlog+0x95/0x130
       __napi_poll+0x25/0x1b0
       net_rx_action+0x29b/0x310
       __do_softirq+0xc0/0x29b
       do_softirq+0x43/0x60
       </IRQ>

      [2]
      #!/bin/bash

      ip link add name veth0 type veth peer name veth1
      ip link set dev veth0 up
      ip link set dev veth1 up
      tc qdisc add dev veth1 clsact
      tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
      mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1

    What happens is that inside most classifiers the tcf_result is copied over
    from a filter template e.g. *res = f->res which then implicitly overrides
    the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
    set via sch_handle_{ingress,egress}() for kfree_skb_reason().

    Commit text above copied verbatim from Daniel. The general idea of the patch
    is not very different from what Ido originally posted but instead done at the
    cls_api codepath.

    Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
    Reported-by: Ido Schimmel <idosch@idosch.org>
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-05-14 13:13:20 +02:00
Ivan Vecera 548a291150 net: sched: Replace strlcpy with strscpy
JIRA: https://issues.redhat.com/browse/RHEL-1773

commit 989b52cdc84955c2a35bc18f53e3a83edfa6f404
Author: Azeem Shaikh <azeemshaikh38@gmail.com>
Date:   Mon Jul 10 03:07:11 2023 +0000

    net: sched: Replace strlcpy with strscpy

    strlcpy() reads the entire source buffer first.
    This read may exceed the destination size limit.
    This is both inefficient and can lead to linear read
    overflows if a source string is not NUL-terminated [1].
    In an effort to remove strlcpy() completely [2], replace
    strlcpy() here with strscpy().

    Direct replacement is safe here since return value of -errno
    is used to check for truncation instead of sizeof(dest).

    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
    [2] https://github.com/KSPP/linux/issues/89

    Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
    Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-13 09:03:12 +02:00
Ivan Vecera 8ad2e18e35 net/sched: act_api: use the correct TCA_ACT attributes in dump
JIRA: https://issues.redhat.com/browse/RHEL-1773

commit fcb3a4653bc5fb0525d957db0cc8b413252029f8
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Tue Mar 21 19:33:45 2023 -0300

    net/sched: act_api: use the correct TCA_ACT attributes in dump

    4 places in the act api code are using 'TCA_' definitions where they
    should be using 'TCA_ACT_', which is confusing for the reader, although
    functionally they are equivalent.

    Cc: Hangbin Liu <haliu@redhat.com>
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Hangbin Liu <haliu@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-10-13 09:03:07 +02:00
Ivan Vecera 3cb7a70c5c net/sched: act_api: add specific EXT_WARN_MSG for tc action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 2f59823fe696caa844249a90bb3f9aeda69cfe5c
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Mar 16 11:37:53 2023 +0800

    net/sched: act_api: add specific EXT_WARN_MSG for tc action

    In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG
    to report tc extact message") I didn't notice the tc action use different
    enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action.
    Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this
    param before going to the TCA_ACT_TAB nest.

    Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:55 +02:00
Ivan Vecera 2c004a4ca4 Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 8de2bd02439eb839a452a853c1004c2c45ff6fef
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Mar 16 11:37:52 2023 +0800

    Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"

    This reverts commit 923b2e30dc9cd05931da0f64e2e23d040865c035.

    This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to
    TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding
    TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in
    TCA_ROOT_MAX as Jamal suggested.

    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:55 +02:00
Ivan Vecera 6707b92dba net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 923b2e30dc9cd05931da0f64e2e23d040865c035
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Fri Feb 24 14:56:01 2023 -0300

    net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy

    TCA_EXT_WARN_MSG is currently sitting outside of the expected hierarchy
    for the tc actions code. It should sit within TCA_ACT_TAB.

    Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:55 +02:00
Ivan Vecera f7627d50bf net/sched: cls_api: Support hardware miss to tc action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 80cd22c35c9001fe72bf614d29439de41933deca
Author: Paul Blakey <paulb@nvidia.com>
Date:   Sat Feb 18 00:36:14 2023 +0200

    net/sched: cls_api: Support hardware miss to tc action

    For drivers to support partial offload of a filter's action list,
    add support for action miss to specify an action instance to
    continue from in sw.

    CT action in particular can't be fully offloaded, as new connections
    need to be handled in software. This imposes other limitations on
    the actions that can be offloaded together with the CT action, such
    as packet modifications.

    Assign each action on a filter's action list a unique miss_cookie
    which drivers can then use to fill action_miss part of the tc skb
    extension. On getting back this miss_cookie, find the action
    instance with relevant cookie and continue classifying from there.

    Signed-off-by: Paul Blakey <paulb@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:55 +02:00
Ivan Vecera a9c4a3bda2 net/sched: Rename user cookie and act cookie
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

Conflicts:
- hunk for mlx5 was skipped as it is not applicable due to absence of
  commit cca7eac13856 ("net/mlx5e: TC, store tc action cookies per attr")

commit db4b49025c0c7116f1d2dfe8d5bbfc983ac054de
Author: Paul Blakey <paulb@nvidia.com>
Date:   Sat Feb 18 00:36:13 2023 +0200

    net/sched: Rename user cookie and act cookie

    struct tc_action->act_cookie is a user defined cookie,
    and the related struct flow_action_entry->act_cookie is
    used as an handle similar to struct flow_cls_offload->cookie.

    Rename tc_action->act_cookie to user_cookie, and
    flow_action_entry->act_cookie to cookie so their names
    would better fit their usage.

    Signed-off-by: Paul Blakey <paulb@nvidia.com>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:54 +02:00
Ivan Vecera 56a427ce18 net/sched: support per action hw stats
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 5246c896b805b043a87fa78af32a33cbce00de05
Author: Oz Shlomo <ozsh@nvidia.com>
Date:   Sun Feb 12 15:25:16 2023 +0200

    net/sched: support per action hw stats

    There are currently two mechanisms for populating hardware stats:
    1. Using flow_offload api to query the flow's statistics.
       The api assumes that the same stats values apply to all
       the flow's actions.
       This assumption breaks when action drops or jumps over following
       actions.
    2. Using hw_action api to query specific action stats via a driver
       callback method. This api assures the correct action stats for
       the offloaded action, however, it does not apply to the rest of the
       actions in the flow's actions array.

    Extend the flow_offload stats callback to indicate that a per action
    stats update is required.
    Use the existing flow_offload_action api to query the action's hw stats.
    In addition, currently the tc action stats utility only updates hw actions.
    Reuse the existing action stats cb infrastructure to query any action
    stats.

    Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:53 +02:00
Ivan Vecera 144d78951e net/sched: introduce flow_offload action cookie
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit d307b2c6f962ad5d83d7a7df71c2e9c9e4106d82
Author: Oz Shlomo <ozsh@nvidia.com>
Date:   Sun Feb 12 15:25:15 2023 +0200

    net/sched: introduce flow_offload action cookie

    Currently a hardware action is uniquely identified by the <id, hw_index>
    tuple. However, the id is set by the flow_act_setup callback and tc core
    cannot enforce this, and it is possible that a future change could break
    this. In addition, <id, hw_index> are not unique across network namespaces.

    Uniquely identify the action by setting an action cookie by the tc core.
    Use the unique action cookie to query the action's hardware stats.

    Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:53 +02:00
Ivan Vecera 36d45c0d0a net/sched: optimize action stats api calls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 8f2ca70c07f4cee68ed6297c1876c28b73c9af21
Author: Oz Shlomo <ozsh@nvidia.com>
Date:   Sun Feb 12 15:25:12 2023 +0200

    net/sched: optimize action stats api calls

    Currently the hw action stats update is called from tcf_exts_hw_stats_update,
    when a tc filter is dumped, and from tcf_action_copy_stats, when a hw
    action is dumped.
    However, the tcf_action_copy_stats is also called from tcf_action_dump.
    As such, the hw action stats update cb is called 3 times for every
    tc flower filter dump.

    Move the tc action hw stats update from tcf_action_copy_stats to
    tcf_dump_walker to update the hw action stats when tc action is dumped.

    Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:53 +02:00
Ivan Vecera 8f592ab576 sched: add new attr TCA_EXT_WARN_MSG to report tc extact message
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 0349b8779cc949ad9e6aced32672ee48cf79b497
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Fri Jan 13 11:43:53 2023 +0800

    sched: add new attr TCA_EXT_WARN_MSG to report tc extact message

    We will report extack message if there is an error via netlink_ack(). But
    if the rule is not to be exclusively executed by the hardware, extack is not
    passed along and offloading failures don't get logged.

    In commit 81c7288b17 ("sched: cls: enable verbose logging") Marcelo
    made cls could log verbose info for offloading failures, which helps
    improving Open vSwitch debuggability when using flower offloading.

    It would also be helpful if userspace monitor tools, like "tc monitor",
    could log this kind of message, as it doesn't require vswitchd log level
    adjusment. Let's add a new tc attributes to report the extack message so
    the monitor program could receive the failures. e.g.

      # tc monitor
      added chain dev enp3s0f1np1 parent ffff: chain 0
      added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1
        ct_state +trk+new
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1

      Warning: mlx5_core: matching on ct_state +new isn't supported.

    In this patch I only report the extack message on add/del operations.
    It doesn't look like we need to report the extack message on get/dump
    operations.

    Note this message not only reporte to multicast groups, it could also
    be reported unicast, which may affect the current usersapce tool's behaivor.

    Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:49 +02:00
Ivan Vecera d9626e9657 net/sched: avoid indirect act functions on retpoline kernels
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit 871cf386dd16705b1e08942efd02c58801293d01
Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Tue Dec 6 10:55:12 2022 -0300

    net/sched: avoid indirect act functions on retpoline kernels

    Expose the necessary tc act functions and wire up act_api to use
    direct calls in retpoline kernels.

    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Victor Nogueira <victor@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:49 +02:00
Ivan Vecera 3b0e2d2b94 net: sched: act_api: implement generic walker and search for tc action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172886

commit fae52d9323384c32283777e893bf85293588ce62
Author: Zhengchao Shao <shaozhengchao@huawei.com>
Date:   Thu Sep 8 12:14:34 2022 +0800

    net: sched: act_api: implement generic walker and search for tc action

    Being able to get tc_action_net by using net_id stored in tc_action_ops
    and execute the generic walk/search function, add __tcf_generic_walker()
    and __tcf_idr_search() helpers.

    Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-05-10 20:48:40 +02:00
Ivan Vecera f852ea3df4 net/sched: act_api: Notify user space if any actions were flushed before error
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139170

commit 76b39b94382f9e0a639e1c70c3253de248cc4c83
Author: Victor Nogueira <victor@mojatatu.com>
Date:   Thu Jun 23 11:07:41 2022 -0300

    net/sched: act_api: Notify user space if any actions were flushed before error

    If during an action flush operation one of the actions is still being
    referenced, the flush operation is aborted and the kernel returns to
    user space with an error. However, if the kernel was able to flush, for
    example, 3 actions and failed on the fourth, the kernel will not notify
    user space that it deleted 3 actions before failing.

    This patch fixes that behaviour by notifying user space of how many
    actions were deleted before flush failed and by setting extack with a
    message describing what happened.

    Fixes: 55334a5db5 ("net_sched: act: refuse to remove bound action outside")
    Signed-off-by: Victor Nogueira <victor@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-13 16:59:02 +01:00
Ivan Vecera ae16730e38 net/sched: act_api: Add extack to offload_act_setup() callback
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139170

commit c2ccf84ecb715bb81dc7f51e69d680a95bf055ae
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Apr 7 10:35:22 2022 +0300

    net/sched: act_api: Add extack to offload_act_setup() callback

    The callback is used by various actions to populate the flow action
    structure prior to offload. Pass extack to this callback so that the
    various actions will be able to report accurate error messages to user
    space.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-11-13 16:59:00 +01:00
Ivan Vecera c5a7c40092 flow_offload: improve extack msg for user when adding invalid filter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit d922a99b96d0030f2e7e8128e98f29123172bd03
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Wed Mar 2 11:29:29 2022 +0800

    flow_offload: improve extack msg for user when adding invalid filter

    Add extack message to return exact message to user when adding invalid
    filter with conflict flags for TC action.

    In previous implement we just return EINVAL which is confusing for user.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Reviewed-by: Roi Dayan <roid@nvidia.com>
    Link: https://lore.kernel.org/r/1646191769-17761-1-git-send-email-baowen.zheng@corigine.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:32:39 +02:00
Ivan Vecera a6833ccbf0 net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit ecf4a24cf97838fb0b78d4ede0f91d80b058289c
Author: Wan Jiabing <wanjiabing@vivo.com>
Date:   Wed Feb 23 10:34:19 2022 +0800

    net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD

    Fix following coccicheck warning:
    ./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message
    in NL_SET_ERR_MSG_MOD

    Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:32:00 +02:00
Ivan Vecera 1867b6a896 net: sched: limit TC_ACT_REPEAT loops
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 5740d068909676d4bdb5c9c00c37a83df7728909
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Feb 15 15:53:05 2022 -0800

    net: sched: limit TC_ACT_REPEAT loops

    We have been living dangerously, at the mercy of malicious users,
    abusing TC_ACT_REPEAT, as shown by this syzpot report [1].

    Add an arbitrary limit (32) to the number of times an action can
    return TC_ACT_REPEAT.

    v2: switch the limit to 32 instead of 10.
        Use net_warn_ratelimited() instead of pr_err_once().

    [1] (C repro available on demand)

    rcu: INFO: rcu_preempt self-detected stall on CPU
    rcu:    1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
            (t=10502 jiffies g=5305 q=190)
    rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
    rcu:    Possible timer handling issue on cpu=0 timer-softirq=3527
    rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
    rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
    rcu: RCU grace-period kthread stack dump:
    task:rcu_preempt     state:I stack:29344 pid:   14 ppid:     2 flags:0x00004000
    Call Trace:
     <TASK>
     context_switch kernel/sched/core.c:4986 [inline]
     __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
     schedule+0xd2/0x260 kernel/sched/core.c:6368
     schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
     rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
     rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
     kthread+0x2e9/0x3a0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
     </TASK>
    rcu: Stack dump where RCU GP kthread last ran:
    Sending NMI from CPU 1 to CPUs 0:
    NMI backtrace for cpu 0
    CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
    RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
    RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
    RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
    Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
    RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206
    RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e
    RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000
    RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e
    R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80
    FS:  00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
     queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
     queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
     do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
     spin_lock_bh include/linux/spinlock.h:354 [inline]
     sch_tree_lock include/net/sch_generic.h:610 [inline]
     sch_tree_lock include/net/sch_generic.h:605 [inline]
     prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
     prio_init+0x5c/0x80 net/sched/sch_prio.c:244
     qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
     tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
     rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
     netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
     netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
     netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
     netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
     sock_sendmsg_nosec net/socket.c:705 [inline]
     sock_sendmsg+0xcf/0x120 net/socket.c:725
     ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
     ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
     __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f7ee98aae99
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99
    RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
    R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0
    R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4
     </TASK>
    INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
    NMI backtrace for cpu 1
    CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: mld mld_ifc_work
    Call Trace:
     <IRQ>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
     nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
     trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
     rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
     print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
     check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
     rcu_pending kernel/rcu/tree.c:3919 [inline]
     rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
     update_process_times+0x16d/0x200 kernel/time/timer.c:1785
     tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
     tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
     __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
     __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
     hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
     local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
     __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
     sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
     </IRQ>
     <TASK>
     asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
    RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
    Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
    RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246
    RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
    RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000
    R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0
    R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8
     tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
     tcf_action_exec net/sched/act_api.c:1049 [inline]
     tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
     tcf_exts_exec include/net/pkt_cls.h:326 [inline]
     route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
     __tcf_classify net/sched/cls_api.c:1549 [inline]
     tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
     prio_classify net/sched/sch_prio.c:42 [inline]
     prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
     dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
     __dev_xmit_skb net/core/dev.c:3756 [inline]
     __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
     neigh_hh_output include/net/neighbour.h:533 [inline]
     neigh_output include/net/neighbour.h:547 [inline]
     ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
     __ip_finish_output net/ipv4/ip_output.c:306 [inline]
     __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
     ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
     NF_HOOK_COND include/linux/netfilter.h:296 [inline]
     ip_output+0x196/0x310 net/ipv4/ip_output.c:430
     dst_output include/net/dst.h:451 [inline]
     ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
     iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
     geneve_xmit_skb drivers/net/geneve.c:966 [inline]
     geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
     __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
     netdev_start_xmit include/linux/netdevice.h:4697 [inline]
     xmit_one net/core/dev.c:3473 [inline]
     dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
     __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
     neigh_hh_output include/net/neighbour.h:533 [inline]
     neigh_output include/net/neighbour.h:547 [inline]
     ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
     __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
     __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
     ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
     NF_HOOK_COND include/linux/netfilter.h:296 [inline]
     ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
     dst_output include/net/dst.h:451 [inline]
     NF_HOOK include/linux/netfilter.h:307 [inline]
     NF_HOOK include/linux/netfilter.h:301 [inline]
     mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
     mld_send_cr net/ipv6/mcast.c:2127 [inline]
     mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
     process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
     worker_thread+0x657/0x1110 kernel/workqueue.c:2454
     kthread+0x2e9/0x3a0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
     </TASK>
    ----------------
    Code disassembly (best guess):
       0:   48 89 eb                mov    %rbp,%rbx
       3:   c6 45 01 01             movb   $0x1,0x1(%rbp)
       7:   41 bc 00 80 00 00       mov    $0x8000,%r12d
       d:   48 c1 e9 03             shr    $0x3,%rcx
      11:   83 e3 07                and    $0x7,%ebx
      14:   41 be 01 00 00 00       mov    $0x1,%r14d
      1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
      21:   fc ff df
      24:   4c 8d 2c 01             lea    (%rcx,%rax,1),%r13
      28:   eb 0c                   jmp    0x36
    * 2a:   f3 90                   pause <-- trapping instruction
      2c:   41 83 ec 01             sub    $0x1,%r12d
      30:   0f 84 72 04 00 00       je     0x4a8
      36:   41 0f b6 45 00          movzbl 0x0(%r13),%eax
      3b:   38 d8                   cmp    %bl,%al
      3d:   7f 08                   jg     0x47
      3f:   84                      .byte 0x84

    Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Cc: Cong Wang <xiyou.wangcong@gmail.com>
    Cc: Jiri Pirko <jiri@resnulli.us>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:55 +02:00
Ivan Vecera e148a66965 flow_offload: fix suspicious RCU usage when offloading tc action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 963178a06352a059c688eb36f1f8c2f03212b60b
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Wed Dec 22 12:25:46 2021 +0800

    flow_offload: fix suspicious RCU usage when offloading tc action

    Fix suspicious rcu_dereference_protected() usage when offloading tc action.

    We should hold tcfa_lock to offload tc action in action initiation.

    Without these changes, the following warning will be observed:

    WARNING: suspicious RCU usage
    5.16.0-rc5-net-next-01504-g7d1f236dcffa-dirty #50 Tainted: G          I
    -----------------------------
    include/net/tc_act/tc_tunnel_key.h:33 suspicious rcu_dereference_protected() usage!
    1 lock held by tc/12108:
    CPU: 4 PID: 12108 Comm: tc Tainted: G
    Hardware name: Dell Inc. PowerEdge R740/07WCGN, BIOS 1.6.11 11/20/2018
    Call Trace:
    <TASK>
    dump_stack_lvl+0x49/0x5e
    dump_stack+0x10/0x12
    lockdep_rcu_suspicious+0xed/0xf8
    tcf_tunnel_key_offload_act_setup+0x1de/0x300 [act_tunnel_key]
    tcf_action_offload_add_ex+0xc0/0x1f0
    tcf_action_init+0x26a/0x2f0
    tcf_action_add+0xa9/0x1f0
    tc_ctl_action+0xfb/0x170
    rtnetlink_rcv_msg+0x169/0x510
    ? sched_clock+0x9/0x10
    ? rtnl_newlink+0x70/0x70
    netlink_rcv_skb+0x55/0x100
    rtnetlink_rcv+0x15/0x20
    netlink_unicast+0x1a8/0x270
    netlink_sendmsg+0x245/0x490
    sock_sendmsg+0x65/0x70
    ____sys_sendmsg+0x219/0x260
    ? __import_iovec+0x2c/0x150
    ___sys_sendmsg+0xb7/0x100
    ? __lock_acquire+0x3d5/0x1f40
    ? __this_cpu_preempt_check+0x13/0x20
    ? lock_is_held_type+0xe4/0x140
    ? sched_clock+0x9/0x10
    ? ktime_get_coarse_real_ts64+0xbe/0xd0
    ? __this_cpu_preempt_check+0x13/0x20
    ? lockdep_hardirqs_on+0x7e/0x100
    ? ktime_get_coarse_real_ts64+0xbe/0xd0
    ? trace_hardirqs_on+0x2a/0xf0
    __sys_sendmsg+0x5a/0xa0
    ? syscall_trace_enter.constprop.0+0x1dd/0x220
    __x64_sys_sendmsg+0x1f/0x30
    do_syscall_64+0x3b/0x90
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f4db7bb7a60

    Fixes: 8cbfe939abe9 ("flow_offload: allow user to offload tc action to net device")
    Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:27 +02:00
Ivan Vecera 5dc16cd298 flow_offload: validate flags of filter and actions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit c86e0209dc7725c91583e3c0c78c3da6a28daeb4
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:28 2021 +0100

    flow_offload: validate flags of filter and actions

    Add process to validate flags of filter and actions when adding
    a tc filter.

    We need to prevent adding filter with flags conflicts with its actions.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:26 +02:00
Ivan Vecera c3f41698ec flow_offload: add reoffload process to update hw_count
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 13926d19a11e303f12571df61b7bb64f17cb4561
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:27 2021 +0100

    flow_offload: add reoffload process to update hw_count

    Add reoffload process to update hw_count when driver
    is inserted or removed.

    We will delete the action if it is with skip_sw flag and
    not offloaded to any hardware in reoffload process.

    When reoffloading actions, we still offload the actions
    that are added independent of filters.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:26 +02:00
Ivan Vecera dbb0a6fb39 net: sched: save full flags for tc action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit e8cb5bcf6ed6d42227c453a3a3170105462f69df
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:26 2021 +0100

    net: sched: save full flags for tc action

    Save full action flags and return user flags when return flags to
    user space.

    Save full action flags to distinguish if the action is created
    independent from classifier.

    We made this change mainly for further patch to reoffload tc actions.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:25 +02:00
Ivan Vecera a0e5ff8ebe flow_offload: add process to update action stats from hardware
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit c7a66f8d8a946edafb38150480145ab9801e4e52
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:25 2021 +0100

    flow_offload: add process to update action stats from hardware

    When collecting stats for actions update them using both
    hardware and software counters.

    Stats update process should not run in context of preempt_disable.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:19 +02:00
Ivan Vecera 21e3db17fb flow_offload: add skip_hw and skip_sw to control if offload the action
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 7adc576512110ef32b0424a727ee1d04359fc205
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:23 2021 +0100

    flow_offload: add skip_hw and skip_sw to control if offload the action

    We add skip_hw and skip_sw for user to control if offload the action
    to hardware.

    We also add in_hw_count for user to indicate if the action is offloaded
    to any hardware.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:19 +02:00
Ivan Vecera 8bace59d89 flow_offload: allow user to offload tc action to net device
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 8cbfe939abe905280279e84a297b1cb34e0d0ec9
Author: Baowen Zheng <baowen.zheng@corigine.com>
Date:   Fri Dec 17 19:16:22 2021 +0100

    flow_offload: allow user to offload tc action to net device

    Use flow_indr_dev_register/flow_indr_dev_setup_offload to
    offload tc action.

    We need to call tc_cleanup_flow_action to clean up tc action entry since
    in tc_setup_action, some actions may hold dev refcnt, especially the mirror
    action.

    Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
    Signed-off-by: Louis Peens <louis.peens@corigine.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:31:19 +02:00
Ivan Vecera 43c09223c2 net: sched: Remove Qdisc::running sequence counter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 29cbcd85828372333aa87542c51f2b2b0fd4380c
Author: Ahmed S. Darwish <a.darwish@linutronix.de>
Date:   Sat Oct 16 10:49:10 2021 +0200

    net: sched: Remove Qdisc::running sequence counter

    The Qdisc::running sequence counter has two uses:

      1. Reliably reading qdisc's tc statistics while the qdisc is running
         (a seqcount read/retry loop at gnet_stats_add_basic()).

      2. As a flag, indicating whether the qdisc in question is running
         (without any retry loops).

    For the first usage, the Qdisc::running sequence counter write section,
    qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
    is actually needed: the raw qdisc's bstats update. A u64_stats sync
    point was thus introduced (in previous commits) inside the bstats
    structure itself. A local u64_stats write section is then started and
    stopped for the bstats updates.

    Use that u64_stats sync point mechanism for the bstats read/retry loop
    at gnet_stats_add_basic().

    For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
    accessed with atomic bitops, is sufficient. Using a bit flag instead of
    a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
    to the SMP barriers implicitly added through raw_read_seqcount() and
    write_seqcount_begin/end() getting removed. All call sites have been
    surveyed though, and no required ordering was identified.

    Now that the qdisc->running sequence counter is no longer used, remove
    it.

    Note, using u64_stats implies no sequence counter protection for 64-bit
    architectures. This can lead to the qdisc tc statistics "packets" vs.
    "bytes" values getting out of sync on rare occasions. The individual
    values will still be valid.

    Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:30:52 +02:00
Ivan Vecera 07089a02ec net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 50dc9a8572aa4d7cdc56670228fcde40289ed289
Author: Ahmed S. Darwish <a.darwish@linutronix.de>
Date:   Sat Oct 16 10:49:09 2021 +0200

    net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types

    The only factor differentiating per-CPU bstats data type (struct
    gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
    gnet_stats_basic_packed) was a u64_stats sync point inside the former.
    The two data types are now equivalent: earlier commits added a u64_stats
    sync point to the latter.

    Combine both data types into "struct gnet_stats_basic_sync". This
    eliminates redundancy and simplifies the bstats read/write APIs.

    Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
    architectures, u64_stats sync points do not use sequence counter
    protection.

    Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:30:51 +02:00
Ivan Vecera 8996766220 net: sched: Protect Qdisc::bstats with u64_stats
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 67c9e6270f3013e4d86ec57c4e7f27459f2a0652
Author: Ahmed S. Darwish <a.darwish@linutronix.de>
Date:   Sat Oct 16 10:49:07 2021 +0200

    net: sched: Protect Qdisc::bstats with u64_stats

    The not-per-CPU variant of qdisc tc (traffic control) statistics,
    Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running
    sequence counter.

    This sequence counter is used for reliably protecting bstats reads from
    parallel writes. Meanwhile, the seqcount's write section covers a much
    wider area than bstats update: qdisc_run_begin() => qdisc_run_end().

    That read/write section asymmetry can lead to needless retries of the
    read section. To prepare for removing the Qdisc::running sequence
    counter altogether, introduce a u64_stats sync point inside bstats
    instead.

    Modify _bstats_update() to start/end the bstats u64_stats write
    section.

    For bisectability, and finer commits granularity, the bstats read
    section is still protected with a Qdisc::running read/retry loop and
    qdisc_run_begin/end() still starts/ends that seqcount write section.
    Once all call sites are modified to use _bstats_update(), the
    Qdisc::running seqcount will be removed and bstats read/retry loop will
    be modified to utilize the internal u64_stats sync point.

    Note, using u64_stats implies no sequence counter protection for 64-bit
    architectures. This can lead to the statistics "packets" vs. "bytes"
    values getting out of sync on rare occasions. The individual values will
    still be valid.

    [bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.]

    Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:30:50 +02:00
Ivan Vecera f5c788ff73 net_sched: refactor TC action init API
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit 695176bfe5dec2051f950bdac0ae0b21e29e6de3
Author: Cong Wang <cong.wang@bytedance.com>
Date:   Thu Jul 29 16:12:14 2021 -0700

    net_sched: refactor TC action init API

    TC action ->init() API has 10 parameters, it becomes harder
    to read. Some of them are just boolean and can be replaced
    by flags. Similarly for the internal API tcf_action_init()
    and tcf_exts_validate().

    This patch converts them to flags and fold them into
    the upper 16 bits of "flags", whose lower 16 bits are still
    reserved for user-space. More specifically, the following
    kernel flags are introduced:

    TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
    distinguish whether it is compatible with policer.

    TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
    this action is bound to a filter.

    TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
    means we are replacing an existing action.

    TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
    opposite meaning, because we still hold RTNL in most
    cases.

    The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
    untouched and still stored as before.

    I have tested this patch with tdc and I do not see any
    failure related to this patch.

    Tested-by: Vlad Buslov <vladbu@nvidia.com>
    Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
    Cc: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:29:44 +02:00
Ivan Vecera 10aacaa742 net/sched: Remove unnecessary if statement
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2090410

commit f79a3bcb1a50d919147b9f22855d355ed8e03031
Author: Yajun Deng <yajun.deng@linux.dev>
Date:   Thu Jul 15 20:24:24 2021 +0800

    net/sched: Remove unnecessary if statement

    It has been deal with the 'if (err' statement in rtnetlink_send()
    and rtnl_unicast(). so remove unnecessary if statement.

    v2: use the raw name rtnetlink_send().

    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2022-06-06 16:29:43 +02:00
Yang Yingliang 55d96f72e8 net: sched: fix error return code in tcf_del_walker()
When nla_put_u32() fails, 'ret' could be 0, it should
return error code in tcf_del_walker().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17 11:36:18 -07:00
Vlad Buslov b3650bf76a net: sched: fix err handler in tcf_action_init()
With recent changes that separated action module load from action
initialization tcf_action_init() function error handling code was modified
to manually release the loaded modules if loading/initialization of any
further action in same batch failed. For the case when all modules
successfully loaded and some of the actions were initialized before one of
them failed in init handler. In this case for all previous actions the
module will be released twice by the error handler: First time by the loop
that manually calls module_put() for all ops, and second time by the action
destroy code that puts the module after destroying the action.

Reproduction:

$ sudo tc actions add action simple sdata \"2\" index 2
$ sudo tc actions add action simple sdata \"1\" index 1 \
                      action simple sdata \"2\" index 2
RTNETLINK answers: File exists
We have an error talking to the kernel
$ sudo tc actions ls action simple
total acts 1

        action order 0: Simple <"2">
         index 2 ref 1 bind 0
$ sudo tc actions flush action simple
$ sudo tc actions ls action simple
$ sudo tc actions add action simple sdata \"2\" index 2
Error: Failed to load TC action module.
We have an error talking to the kernel
$ lsmod | grep simple
act_simple             20480  -1

Fix the issue by modifying module reference counting handling in action
initialization code:

- Get module reference in tcf_idr_create() and put it in tcf_idr_release()
instead of taking over the reference held by the caller.

- Modify users of tcf_action_init_1() to always release the module
reference which they obtain before calling init function instead of
assuming that created action takes over the reference.

- Finally, modify tcf_action_init_1() to not release the module reference
when overwriting existing action as this is no longer necessary since both
upper and lower layers obtain and manage their own module references
independently.

Fixes: d349f99768 ("net_sched: fix RTNL deadlock again caused by request_module()")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:47:33 -07:00
Vlad Buslov 87c750e8c3 net: sched: fix action overwrite reference counting
Action init code increments reference counter when it changes an action.
This is the desired behavior for cls API which needs to obtain action
reference for every classifier that points to action. However, act API just
needs to change the action and releases the reference before returning.
This sequence breaks when the requested action doesn't exist, which causes
act API init code to create new action with specified index, but action is
still released before returning and is deleted (unless it was referenced
concurrently by cls API).

Reproduction:

$ sudo tc actions ls action gact
$ sudo tc actions change action gact drop index 1
$ sudo tc actions ls action gact

Extend tcf_action_init() to accept 'init_res' array and initialize it with
action->ops->init() result. In tcf_action_add() remove pointers to created
actions from actions array before passing it to tcf_action_put_many().

Fixes: cae422f379 ("net: sched: use reference counting action init")
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:47:33 -07:00
Vlad Buslov 4ba86128ba Revert "net: sched: bump refcount for new action in ACT replace mode"
This reverts commit 6855e8213e.

Following commit in series fixes the issue without introducing regression
in error rollback of tcf_action_destroy().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:47:33 -07:00