Commit Graph

22 Commits

Author SHA1 Message Date
Adrian Moreno e6c995286e net: psample: fix flag being set in wrong skb
JIRA: https://issues.redhat.com/browse/RHEL-31876
Upstream-Status: net-next.git

commit 8341eee81c794db0d8dd503c2b0ea2f55eba7334
Author: Adrian Moreno <amorenoz@redhat.com>
Date:   Wed Jul 10 19:10:04 2024 +0200

    net: psample: fix flag being set in wrong skb

    A typo makes PSAMPLE_ATTR_SAMPLE_RATE netlink flag be added to the wrong
    sk_buff.

    Fix the error and make the input sk_buff pointer "const" so that it
    doesn't happen again.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Fixes: 7b1b2b60c63f ("net: psample: allow using rate as probability")
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20240710171004.2164034-1-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
2024-07-12 10:44:49 +02:00
Adrian Moreno 6cddd5d51d net: psample: allow using rate as probability
JIRA: https://issues.redhat.com/browse/RHEL-31876
Upstream-Status: net-next.git

commit 7b1b2b60c63f070e0dfbe072ccaae13168b38d01
Author: Adrian Moreno <amorenoz@redhat.com>
Date:   Thu Jul 4 10:56:55 2024 +0200

    net: psample: allow using rate as probability

    Although not explicitly documented in the psample module itself, the
    definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.

    Quoting tc-sample(8):
    "RATE of 100 will lead to an average of one sampled packet out of every
    100 observed."

    With this semantics, the rates that we can express with an unsigned
    32-bits number are very unevenly distributed and concentrated towards
    "sampling few packets".
    For example, we can express a probability of 2.32E-8% but we
    cannot express anything between 100% and 50%.

    For sampling applications that are capable of sampling a decent
    amount of packets, this sampling rate semantics is not very useful.

    Add a new flag to the uAPI that indicates that the sampling rate is
    expressed in scaled probability, this is:
    - 0 is 0% probability, no packets get sampled.
    - U32_MAX is 100% probability, all packets get sampled.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-5-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
2024-07-09 16:31:43 +02:00
Adrian Moreno 745c698843 net: psample: skip packet copy if no listeners
JIRA: https://issues.redhat.com/browse/RHEL-31876
Upstream-Status: net-next.git

commit c35d86a23029f1186e3c7a65df7c38b762fb0434
Author: Adrian Moreno <amorenoz@redhat.com>
Date:   Thu Jul 4 10:56:54 2024 +0200

    net: psample: skip packet copy if no listeners

    If nobody is listening on the multicast group, generating the sample,
    which involves copying packet data, seems completely unnecessary.

    Return fast in this case.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-4-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
2024-07-09 16:31:42 +02:00
Adrian Moreno a8f64879ba net: psample: add user cookie
JIRA: https://issues.redhat.com/browse/RHEL-31876
Upstream-Status: net-next.git

commit 093b0f366567aa3fed85c316f832607069202b23
Author: Adrian Moreno <amorenoz@redhat.com>
Date:   Thu Jul 4 10:56:52 2024 +0200

    net: psample: add user cookie

    Add a user cookie to the sample metadata so that sample emitters can
    provide more contextual information to samples.

    If present, send the user cookie in a new attribute:
    PSAMPLE_ATTR_USER_COOKIE.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-2-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
2024-07-09 16:30:45 +02:00
Ivan Vecera a4a12f7632 ip_tunnel: convert __be16 tunnel flags to bitmaps
JIRA: https://issues.redhat.com/browse/RHEL-40130

Conflicts:
- hunk for non-existing net/ipv4/fou_bpf.c skipped
- conflict in ip_gre.c resolved in the same way as upstream merge
  commit cf1ca1f66d30 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") did
- simple context conflict ip_tunnel.c due to missing commit
  c4794d22251b9 ("ipv4: tunnels: use DEV_STATS_INC()")
- simple context conflict in ip6_gre.c and ip6_tunnel.c due to missing
  commit 2fad1ba354d4a ("ipv6: tunnels: use DEV_STATS_INC()")
- simple conflict in nft_tunnel.c due to missing ffb3d9a30cc67 ("netfilter:
  nf_tables: use correct integer types")

commit 5832c4a77d6931cebf9ba737129ae8f14b66ee1d
Author: Alexander Lobakin <aleksander.lobakin@intel.com>
Date:   Wed Mar 27 16:23:53 2024 +0100

    ip_tunnel: convert __be16 tunnel flags to bitmaps

    Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT
    have been defined as __be16. Now all of those 16 bits are occupied
    and there's no more free space for new flags.
    It can't be simply switched to a bigger container with no
    adjustments to the values, since it's an explicit Endian storage,
    and on LE systems (__be16)0x0001 equals to
    (__be64)0x0001000000000000.
    We could probably define new 64-bit flags depending on the
    Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on
    LE, but that would introduce an Endianness dependency and spawn a
    ton of Sparse warnings. To mitigate them, all of those places which
    were adjusted with this change would be touched anyway, so why not
    define stuff properly if there's no choice.

    Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the
    value already coded and a fistful of <16 <-> bitmap> converters and
    helpers. The two flags which have a different bit position are
    SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as
    __cpu_to_be16(), but as (__force __be16), i.e. had different
    positions on LE and BE. Now they both have strongly defined places.
    Change all __be16 fields which were used to store those flags, to
    IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) ->
    unsigned long[1] for now, and replace all TUNNEL_* occurrences to
    their bitmap counterparts. Use the converters in the places which talk
    to the userspace, hardware (NFP) or other hosts (GRE header). The rest
    must explicitly use the new flags only. This must be done at once,
    otherwise there will be too many conversions throughout the code in
    the intermediate commits.
    Finally, disable the old __be16 flags for use in the kernel code
    (except for the two 'irregular' flags mentioned above), to prevent
    any accidental (mis)use of them. For the userspace, nothing is
    changed, only additions were made.

    Most noticeable bloat-o-meter difference (.text):

    vmlinux:        307/-1 (306)
    gre.ko:         62/0 (62)
    ip_gre.ko:      941/-217 (724)  [*]
    ip_tunnel.ko:   390/-900 (-510) [**]
    ip_vti.ko:      138/0 (138)
    ip6_gre.ko:     534/-18 (516)   [*]
    ip6_tunnel.ko:  118/-10 (108)

    [*] gre_flags_to_tnl_flags() grew, but still is inlined
    [**] ip_tunnel_find() got uninlined, hence such decrease

    The average code size increase in non-extreme case is 100-200 bytes
    per module, mostly due to sizeof(long) > sizeof(__be16), as
    %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers
    are able to expand the majority of bitmap_*() calls here into direct
    operations on scalars.

    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2024-06-12 14:49:18 +02:00
Petr Oros 91f18bfb6c genetlink: Use internal flags for multicast groups
JIRA: https://issues.redhat.com/browse/RHEL-30145

Upstream commit(s):
commit cd4d7263d58ab98fd4dee876776e4da6c328faa3
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Dec 20 17:43:58 2023 +0200

    genetlink: Use internal flags for multicast groups

    As explained in commit e03781879a0d ("drop_monitor: Require
    'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the
    multicast group structure reuses uAPI flags despite the field not being
    exposed to user space. This makes it impossible to extend its use
    without adding new uAPI flags, which is inappropriate for internal
    kernel checks.

    Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert
    the existing users to use them instead of the uAPI flags.

    Tested using the reproducers in commit 44ec98ea5ea9 ("psample: Require
    'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a0d
    ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group").

    No functional changes intended.

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Petr Oros <poros@redhat.com>
2024-04-26 17:16:10 +02:00
Davide Caratti 85e274855f psample: Require 'CAP_NET_ADMIN' when joining "packets" group
JIRA: https://issues.redhat.com/browse/RHEL-21582
Upstream Status: net.git commit 44ec98ea5ea9cfecd31a5c4cc124703cb5442832

commit 44ec98ea5ea9cfecd31a5c4cc124703cb5442832
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Dec 6 23:31:01 2023 +0200

    psample: Require 'CAP_NET_ADMIN' when joining "packets" group

    The "psample" generic netlink family notifies sampled packets over the
    "packets" multicast group. This is problematic since by default generic
    netlink allows non-root users to listen to these notifications.

    Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will
    prevent non-root users or root without the 'CAP_NET_ADMIN' capability
    (in the user namespace owning the network namespace) from joining the
    group.

    Tested using [1].

    Before:

     # capsh -- -c ./psample_repo
     # capsh --drop=cap_net_admin -- -c ./psample_repo

    After:

     # capsh -- -c ./psample_repo
     # capsh --drop=cap_net_admin -- -c ./psample_repo
     Failed to join "packets" multicast group

    [1]
     $ cat psample.c
     #include <stdio.h>
     #include <netlink/genl/ctrl.h>
     #include <netlink/genl/genl.h>
     #include <netlink/socket.h>

     int join_grp(struct nl_sock *sk, const char *grp_name)
     {
            int grp, err;

            grp = genl_ctrl_resolve_grp(sk, "psample", grp_name);
            if (grp < 0) {
                    fprintf(stderr, "Failed to resolve \"%s\" multicast group\n",
                            grp_name);
                    return grp;
            }

            err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
            if (err) {
                    fprintf(stderr, "Failed to join \"%s\" multicast group\n",
                            grp_name);
                    return err;
            }

            return 0;
     }

     int main(int argc, char **argv)
     {
            struct nl_sock *sk;
            int err;

            sk = nl_socket_alloc();
            if (!sk) {
                    fprintf(stderr, "Failed to allocate socket\n");
                    return -1;
            }

            err = genl_connect(sk);
            if (err) {
                    fprintf(stderr, "Failed to connect socket\n");
                    return err;
            }

            err = join_grp(sk, "config");
            if (err)
                    return err;

            err = join_grp(sk, "packets");
            if (err)
                    return err;

            return 0;
     }
     $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o psample_repo psample.c

    Fixes: 6ae0a62861 ("net: Introduce psample, a new genetlink channel for packet sampling")
    Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
2024-01-15 15:27:11 +01:00
Ivan Vecera 6fb59586eb genetlink: start to validate reserved header bytes
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175249

Conflicts:
* kernel/taskstats.c
  context conflict due to missing edc73c7261ca ("kernel: make taskstats
  available from all net namespaces")
* fs/ksmbd/transport_ipc.c
* net/ipv6/ioam6.c
  hunks skipped as the files are not present in RHEL kernel

commit 9c5d03d362519f36cd551aec596388f895c93d2d
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Wed Aug 24 17:18:30 2022 -0700

    genetlink: start to validate reserved header bytes

    We had historically not checked that genlmsghdr.reserved
    is 0 on input which prevents us from using those precious
    bytes in the future.

    One use case would be to extend the cmd field, which is
    currently just 8 bits wide and 256 is not a lot of commands
    for some core families.

    To make sure that new families do the right thing by default
    put the onus of opting out of validation on existing families.

    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
2023-03-06 15:42:45 +01:00
Ido Schimmel 07e1a5809b psample: Add additional metadata attributes
Extend psample to report the following attributes when available:

* Output traffic class as a 16-bit value
* Output traffic class occupancy in bytes as a 64-bit value
* End-to-end latency of the packet in nanoseconds resolution
* Software timestamp in nanoseconds resolution (always available)
* Packet's protocol. Needed for packet dissection in user space (always
  available)

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Ido Schimmel a03e99d39f psample: Encapsulate packet metadata in a struct
Currently, callers of psample_sample_packet() pass three metadata
attributes: Ingress port, egress port and truncated size. Subsequent
patches are going to add more attributes (e.g., egress queue occupancy),
which also need an indication whether they are valid or not.

Encapsulate packet metadata in a struct in order to keep the number of
arguments reasonable.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Chris Mi a93dcaada2 net: psample: Fix netlink skb length with tunnel info
Currently, the psample netlink skb is allocated with a size that does
not account for the nested 'PSAMPLE_ATTR_TUNNEL' attribute and the
padding required for the 64-bit attribute 'PSAMPLE_TUNNEL_KEY_ATTR_ID'.
This can result in failure to add attributes to the netlink skb due
to insufficient tail room. The following error message is printed to
the kernel log: "Could not create psample log message".

Fix this by adjusting the allocation size to take into account the
nested attribute and the padding.

Fixes: d8bed686ab ("net: psample: Add tunnel support")
CC: Yotam Gigi <yotam.gi@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Link: https://lore.kernel.org/r/20210225075145.184314-1-cmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-25 09:49:46 -08:00
Jakub Kicinski 66a9b9287d genetlink: move to smaller ops wherever possible
Bulk of the genetlink users can use smaller ops, move them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 19:11:11 -07:00
Randy Dunlap 07a7f30819 net: psample: fix build error when CONFIG_INET is not enabled
Fix psample build error when CONFIG_INET is not set/enabled by
bracketing the tunnel code in #ifdef CONFIG_NET / #endif.

../net/psample/psample.c: In function ‘__psample_ip_tun_to_nlattr’:
../net/psample/psample.c:216:25: error: implicit declaration of function ‘ip_tunnel_info_opts’; did you mean ‘ip_tunnel_info_opts_set’? [-Werror=implicit-function-declaration]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Yotam Gigi <yotam.gi@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23 16:36:05 -07:00
Chris Mi d8bed686ab net: psample: Add tunnel support
Currently, psample can only send the packet bits after decapsulation.
The tunnel information is lost. Add the tunnel support.

If the sampled packet has no tunnel info, the behavior is the same as
before. If it has, add a nested metadata field named PSAMPLE_ATTR_TUNNEL
and include the tunnel subfields if applicable.

Increase the metadata length for sampled packet with the tunnel info.
If new subfields of tunnel info should be included, update the metadata
length accordingly.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21 17:04:07 -07:00
Nikolay Aleksandrov 7eb9d7675c net: psample: fix skb_over_panic
We need to calculate the skb size correctly otherwise we risk triggering
skb_over_panic[1]. The issue is that data_len is added to the skb in a
nl attribute, but we don't account for its header size (nlattr 4 bytes)
and alignment. We account for it when calculating the total size in
the > PSAMPLE_MAX_PACKET_SIZE comparison correctly, but not when
allocating after that. The fix is simple - use nla_total_size() for
data_len when allocating.

To reproduce:
 $ tc qdisc add dev eth1 clsact
 $ tc filter add dev eth1 egress matchall action sample rate 1 group 1 trunc 129
 $ mausezahn eth1 -b bcast -a rand -c 1 -p 129
 < skb_over_panic BUG(), tail is 4 bytes past skb->end >

[1] Trace:
 [   50.459526][ T3480] skbuff: skb_over_panic: text:(____ptrval____) len:196 put:136 head:(____ptrval____) data:(____ptrval____) tail:0xc4 end:0xc0 dev:<NULL>
 [   50.474339][ T3480] ------------[ cut here ]------------
 [   50.481132][ T3480] kernel BUG at net/core/skbuff.c:108!
 [   50.486059][ T3480] invalid opcode: 0000 [#1] PREEMPT SMP
 [   50.489463][ T3480] CPU: 3 PID: 3480 Comm: mausezahn Not tainted 5.4.0-rc7 #108
 [   50.492844][ T3480] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
 [   50.496551][ T3480] RIP: 0010:skb_panic+0x79/0x7b
 [   50.498261][ T3480] Code: bc 00 00 00 41 57 4c 89 e6 48 c7 c7 90 29 9a 83 4c 8b 8b c0 00 00 00 50 8b 83 b8 00 00 00 50 ff b3 c8 00 00 00 e8 ae ef c0 fe <0f> 0b e8 2f df c8 fe 48 8b 55 08 44 89 f6 4c 89 e7 48 c7 c1 a0 22
 [   50.504111][ T3480] RSP: 0018:ffffc90000447a10 EFLAGS: 00010282
 [   50.505835][ T3480] RAX: 0000000000000087 RBX: ffff888039317d00 RCX: 0000000000000000
 [   50.507900][ T3480] RDX: 0000000000000000 RSI: ffffffff812716e1 RDI: 00000000ffffffff
 [   50.509820][ T3480] RBP: ffffc90000447a60 R08: 0000000000000001 R09: 0000000000000000
 [   50.511735][ T3480] R10: ffffffff81d4f940 R11: 0000000000000000 R12: ffffffff834a22b0
 [   50.513494][ T3480] R13: ffffffff82c10433 R14: 0000000000000088 R15: ffffffff838a8084
 [   50.515222][ T3480] FS:  00007f3536462700(0000) GS:ffff88803eac0000(0000) knlGS:0000000000000000
 [   50.517135][ T3480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [   50.518583][ T3480] CR2: 0000000000442008 CR3: 000000003b222000 CR4: 00000000000006e0
 [   50.520723][ T3480] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [   50.522709][ T3480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [   50.524450][ T3480] Call Trace:
 [   50.525214][ T3480]  skb_put.cold+0x1b/0x1b
 [   50.526171][ T3480]  psample_sample_packet+0x1d3/0x340
 [   50.527307][ T3480]  tcf_sample_act+0x178/0x250
 [   50.528339][ T3480]  tcf_action_exec+0xb1/0x190
 [   50.529354][ T3480]  mall_classify+0x67/0x90
 [   50.530332][ T3480]  tcf_classify+0x72/0x160
 [   50.531286][ T3480]  __dev_queue_xmit+0x3db/0xd50
 [   50.532327][ T3480]  dev_queue_xmit+0x18/0x20
 [   50.533299][ T3480]  packet_sendmsg+0xee7/0x2090
 [   50.534331][ T3480]  sock_sendmsg+0x54/0x70
 [   50.535271][ T3480]  __sys_sendto+0x148/0x1f0
 [   50.536252][ T3480]  ? tomoyo_file_ioctl+0x23/0x30
 [   50.537334][ T3480]  ? ksys_ioctl+0x5e/0xb0
 [   50.540068][ T3480]  __x64_sys_sendto+0x2a/0x30
 [   50.542810][ T3480]  do_syscall_64+0x73/0x1f0
 [   50.545383][ T3480]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [   50.548477][ T3480] RIP: 0033:0x7f35357d6fb3
 [   50.551020][ T3480] Code: 48 8b 0d 18 90 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d f9 d3 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 eb f6 ff ff 48 89 04 24
 [   50.558547][ T3480] RSP: 002b:00007ffe0c7212c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 [   50.561870][ T3480] RAX: ffffffffffffffda RBX: 0000000001dac010 RCX: 00007f35357d6fb3
 [   50.565142][ T3480] RDX: 0000000000000082 RSI: 0000000001dac2a2 RDI: 0000000000000003
 [   50.568469][ T3480] RBP: 00007ffe0c7212f0 R08: 00007ffe0c7212d0 R09: 0000000000000014
 [   50.571731][ T3480] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000082
 [   50.574961][ T3480] R13: 0000000001dac2a2 R14: 0000000000000001 R15: 0000000000000003
 [   50.578170][ T3480] Modules linked in: sch_ingress virtio_net
 [   50.580976][ T3480] ---[ end trace 61a515626a595af6 ]---

CC: Yotam Gigi <yotamg@mellanox.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Simon Horman <simon.horman@netronome.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 6ae0a62861 ("net: Introduce psample, a new genetlink channel for packet sampling")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26 14:40:13 -08:00
Vlad Buslov 4a5da47d5c net: sched: take reference to psample group in flow_action infra
With recent patch set that removed rtnl lock dependency from cls hardware
offload API rtnl lock is only taken when reading action data and can be
released after action-specific data is parsed into intermediate
representation. However, sample action psample group is passed by pointer
without obtaining reference to it first, which makes it possible to
concurrently overwrite the action and deallocate object pointed by
psample_group pointer after rtnl lock is released but before driver
finished using the pointer.

To prevent such race condition, obtain reference to psample group while it
is used by flow_action infra. Extend psample API with function
psample_group_take() that increments psample group reference counter.
Extend struct tc_action_ops with new get_psample_group() API. Implement the
API for action sample using psample_group_take() and already existing
psample_group_put() as a destructor. Use it in tc_setup_flow_action() to
take reference to psample group pointed to by entry->sample.psample_group
and release it in tc_cleanup_flow_action().

Disable bh when taking psample_groups_lock. The lock is now taken while
holding action tcf_lock that is used by data path and requires bh to be
disabled, so doing the same for psample_groups_lock is necessary to
preserve SOFTIRQ-irq-safety.

Fixes: 918190f50e ("net: sched: flower: don't take rtnl lock for cls hw offloads API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16 09:18:03 +02:00
Vlad Buslov dbf47a2a09 net: sched: act_sample: fix psample group handling on overwrite
Action sample doesn't properly handle psample_group pointer in overwrite
case. Following issues need to be fixed:

- In tcf_sample_init() function RCU_INIT_POINTER() is used to set
  s->psample_group, even though we neither setting the pointer to NULL, nor
  preventing concurrent readers from accessing the pointer in some way.
  Use rcu_swap_protected() instead to safely reset the pointer.

- Old value of s->psample_group is not released or deallocated in any way,
  which results resource leak. Use psample_group_put() on non-NULL value
  obtained with rcu_swap_protected().

- The function psample_group_put() that released reference to struct
  psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
  grace period when deallocating it. Extend struct psample_group with rcu
  head and use kfree_rcu when freeing it.

Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-28 15:53:51 -07:00
Thomas Gleixner d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Johannes Berg ef6243acb4 genetlink: optionally validate strictly/dumps
Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.

Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
     {
            .cmd = X,
    +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
            ...
     },
    ...
    };

For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27 17:07:22 -04:00
Yotam Gigi f1fd20c361 MAINTAINERS: Update Yotam's E-mail
For the time being I will be available in my private mail. Update both the
MAINTAINERS file and the individual modules MODULE_AUTHOR directive with
the new address.

Signed-off-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-01 12:19:03 +09:00
Johannes Berg 4df864c1d9 networking: make skb_put & friends return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:39 -04:00
Yotam Gigi 6ae0a62861 net: Introduce psample, a new genetlink channel for packet sampling
Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.

For every sampled packet, the psample module adds the following metadata
fields:

PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable

PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable

PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
   truncated during sampling

PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
   user who initiated the sampling. This field allows the user to
   differentiate between several samplers working simultaneously and
   filter packets relevant to him

PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
   sequence is kept for each group

PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets

PSAMPLE_ATTR_DATA - the actual packet bits

The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast
group. In addition, add the GET_GROUPS netlink command which allows the
user to see the current sample groups, their refcount and sequence number.
This command currently supports only netlink dump mode.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 13:44:28 -05:00