Commit Graph

450 Commits

Author SHA1 Message Date
Ian Kent 92d69b838d fs: port xattr to mnt_idmap
JIRA: https://issues.redhat.com/browse/RHEL-33888
Status: Linus

Conflicts: The cifs source has been moved in CentOS Stream so manually
	apply rejected hunk to fs/smb/client/xattr.c.
        Dropped hunks for ntfs3 because the source is not present in
        the CentOS Stream source tree.
	CentOS Stream commit 98ba731fc7 ("ovl: Move xattr support
	to new xattrs.c file") moved ovl_own_xattr_set(), manually apply
	changes.
	CentOS Stream commit 67e2fcb2f3 ("evm: don't copy up
	'security.evm' xattr") is present causing hunk #1 against
	include/linux/evm.h to be rejected, manually apply.
	Upstream commit 5d1ef2ce13a90 ("ima: Introduce
	ima_get_current_hash_algo()") is not present in CentOS Stream
	which causes fuzz 1 for hunk #1 against include/linux/ima.h.
	There's a reject of hunk #1 for include/linux/lsm_hooks.h but
	I can't see any reason for it, manually applied the hunk.
	CentOS Stream does not have upstream commit ce5bb5a86e5eb
	("ima: Return int in the functions to measure a buffer") which
	results in a reject of hunk #2 against security/integrity/ima/ima.h
	and hunks #8 and #11 against security/integrity/ima/ima_main.c, so
	manually apply hunks. There also appears to be a whitespace
	mismatch causing hunk #7 to report fuzz 2 on application.
	CentOS Stream does not have upstream commit c7423dbdbc9ec
	("ima: Handle -ESTALE returned by ima_filter_rule_match()")
	which results in a reject of hunk #3 against
	security/integrity/ima/ima_policy.c, so manually apply hunk.

commit 39f60c1ccee72caa0104145b5dbf5d37cce1ea39
Author: Christian Brauner <brauner@kernel.org>
Date:   Fri Jan 13 12:49:23 2023 +0100

    fs: port xattr to mnt_idmap

    Convert to struct mnt_idmap.

    Last cycle we merged the necessary infrastructure in
    256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
    This is just the conversion to struct mnt_idmap.

    Currently we still pass around the plain namespace that was attached to a
    mount. This is in general pretty convenient but it makes it easy to
    conflate namespaces that are relevant on the filesystem with namespaces
    that are relevent on the mount level. Especially for non-vfs developers
    without detailed knowledge in this area this can be a potential source for
    bugs.

    Once the conversion to struct mnt_idmap is done all helpers down to the
    really low-level helpers will take a struct mnt_idmap argument instead of
    two namespace arguments. This way it becomes impossible to conflate the two
    eliminating the possibility of any bugs. All of the vfs and all filesystems
    only operate on struct mnt_idmap.

    Acked-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-10-16 10:45:21 +08:00
Ricardo Robaina 5c2d3e02a1 audit,io_uring: io_uring openat triggers audit reference count underflow
JIRA: https://issues.redhat.com/browse/RHEL-35421

This patch is a backport of the following upstream commit:
commit 03adc61edad49e1bbecfb53f7ea5d78f398fe368
Author: Dan Clash <daclash@linux.microsoft.com>
Date:   Thu Oct 12 14:55:18 2023 -0700

    audit,io_uring: io_uring openat triggers audit reference count underflow

    An io_uring openat operation can update an audit reference count
    from multiple threads resulting in the call trace below.

    A call to io_uring_submit() with a single openat op with a flag of
    IOSQE_ASYNC results in the following reference count updates.

    These first part of the system call performs two increments that do not race.

    do_syscall_64()
      __do_sys_io_uring_enter()
        io_submit_sqes()
          io_openat_prep()
            __io_openat_prep()
              getname()
                getname_flags()       /* update 1 (increment) */
                  __audit_getname()   /* update 2 (increment) */

    The openat op is queued to an io_uring worker thread which starts the
    opportunity for a race.  The system call exit performs one decrement.

    do_syscall_64()
      syscall_exit_to_user_mode()
        syscall_exit_to_user_mode_prepare()
          __audit_syscall_exit()
            audit_reset_context()
               putname()              /* update 3 (decrement) */

    The io_uring worker thread performs one increment and two decrements.
    These updates can race with the system call decrement.

    io_wqe_worker()
      io_worker_handle_work()
        io_wq_submit_work()
          io_issue_sqe()
            io_openat()
              io_openat2()
                do_filp_open()
                  path_openat()
                    __audit_inode()   /* update 4 (increment) */
                putname()             /* update 5 (decrement) */
            __audit_uring_exit()
              audit_reset_context()
                putname()             /* update 6 (decrement) */

    The fix is to change the refcnt member of struct audit_names
    from int to atomic_t.

    kernel BUG at fs/namei.c:262!
    Call Trace:
    ...
     ? putname+0x68/0x70
     audit_reset_context.part.0.constprop.0+0xe1/0x300
     __audit_uring_exit+0xda/0x1c0
     io_issue_sqe+0x1f3/0x450
     ? lock_timer_base+0x3b/0xd0
     io_wq_submit_work+0x8d/0x2b0
     ? __try_to_del_timer_sync+0x67/0xa0
     io_worker_handle_work+0x17c/0x2b0
     io_wqe_worker+0x10a/0x350

    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
    Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
    Signed-off-by: Dan Clash <daclash@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2024-07-04 14:53:02 -03:00
Ricardo Robaina 8d34a7ab00 netfilter: nf_tables: Audit log rule reset
JIRA: https://issues.redhat.com/browse/RHEL-9127

This patch is a backport of the following upstream commit:
commit ea078ae9108e25fc881c84369f7c03931d22e555
Author: Phil Sutter <phil@nwl.cc>
Date:   Tue Aug 29 19:51:58 2023 +0200

    netfilter: nf_tables: Audit log rule reset

    Resetting rules' stateful data happens outside of the transaction logic,
    so 'get' and 'dump' handlers have to emit audit log entries themselves.

    Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2023-12-08 13:39:15 -03:00
Ricardo Robaina 4052bb3cd0 netfilter: nf_tables: Audit log setelem reset
JIRA: https://issues.redhat.com/browse/RHEL-9127
Conflicts: Minor context diff due to unrelated upstream commit 00c320f9b755
           ("netfilter: nf_tables: make validation state per table")

This patch is a backport of the following upstream commit:
commit 7e9be1124dbe7888907e82cab20164578e3f9ab7
Author: Phil Sutter <phil@nwl.cc>
Date:   Tue Aug 29 19:51:57 2023 +0200

    netfilter: nf_tables: Audit log setelem reset

    Since set element reset is not integrated into nf_tables' transaction
    logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
    handling.

    For the sake of simplicity, catchall element reset will always generate
    a dedicated log entry. This relieves nf_tables_dump_set() from having to
    adjust the logged element count depending on whether a catchall element
    was found or not.

    Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2023-12-08 13:37:24 -03:00
Ricardo Robaina 46ae962536 audit: cleanup function braces and assignment-in-if-condition
JIRA: https://issues.redhat.com/browse/RHEL-9127

This patch is a backport of the following upstream commit:
commit 22cde1012f6a6509656f976cbe3aa5f4c5d0f1a3
Author: Atul Kumar Pant <atulpant.linux@gmail.com>
Date:   Wed Aug 16 02:16:44 2023 +0530

    audit: cleanup function braces and assignment-in-if-condition

    The patch fixes following checkpatch.pl issue:
    ERROR: open brace '{' following function definitions go on the next line
    ERROR: do not use assignment in if condition

    Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com>
    [PM: subject line tweaks]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2023-12-08 13:36:18 -03:00
Ricardo Robaina 28a5078e5d audit: add space before parenthesis and around '=', "==", and '<'
JIRA: https://issues.redhat.com/browse/RHEL-9127

This patch is a backport of the following upstream commit:
commit 62acadda115a94bffd1f6b36acbb67e3f04811be
Author: Atul Kumar Pant <atulpant.linux@gmail.com>
Date:   Wed Aug 16 02:15:53 2023 +0530

    audit: add space before parenthesis and around '=', "==", and '<'

    Fixes following checkpatch.pl issue:
    ERROR: space required before the open parenthesis '('
    ERROR: spaces required around that '='
    ERROR: spaces required around that '<'
    ERROR: spaces required around that '=='

    Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com>
    [PM: subject line tweaks]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2023-12-08 13:36:14 -03:00
Ricardo Robaina 798e8fd182 audit: fix possible soft lockup in __audit_inode_child()
JIRA: https://issues.redhat.com/browse/RHEL-9127

This patch is a backport of the following upstream commit:
commit b59bc6e37237e37eadf50cd5de369e913f524463
Author: Gaosheng Cui <cuigaosheng1@huawei.com>
Date:   Tue Aug 8 20:14:35 2023 +0800

    audit: fix possible soft lockup in __audit_inode_child()

    Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
    too many PATH records maybe cause soft lockup.

    For example:
      1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
      2. auditctl -a exit,always -S open -k key
      3. sysctl -w kernel.watchdog_thresh=5
      4. mkdir /sys/kernel/debug/tracing/instances/test

    There may be a soft lockup as follows:
      watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
      Kernel panic - not syncing: softlockup: hung tasks
      Call trace:
       dump_backtrace+0x0/0x30c
       show_stack+0x20/0x30
       dump_stack+0x11c/0x174
       panic+0x27c/0x494
       watchdog_timer_fn+0x2bc/0x390
       __run_hrtimer+0x148/0x4fc
       __hrtimer_run_queues+0x154/0x210
       hrtimer_interrupt+0x2c4/0x760
       arch_timer_handler_phys+0x48/0x60
       handle_percpu_devid_irq+0xe0/0x340
       __handle_domain_irq+0xbc/0x130
       gic_handle_irq+0x78/0x460
       el1_irq+0xb8/0x140
       __audit_inode_child+0x240/0x7bc
       tracefs_create_file+0x1b8/0x2a0
       trace_create_file+0x18/0x50
       event_create_dir+0x204/0x30c
       __trace_add_new_event+0xac/0x100
       event_trace_add_tracer+0xa0/0x130
       trace_array_create_dir+0x60/0x140
       trace_array_create+0x1e0/0x370
       instance_mkdir+0x90/0xd0
       tracefs_syscall_mkdir+0x68/0xa0
       vfs_mkdir+0x21c/0x34c
       do_mkdirat+0x1b4/0x1d4
       __arm64_sys_mkdirat+0x4c/0x60
       el0_svc_common.constprop.0+0xa8/0x240
       do_el0_svc+0x8c/0xc0
       el0_svc+0x20/0x30
       el0_sync_handler+0xb0/0xb4
       el0_sync+0x160/0x180

    Therefore, we add cond_resched() to __audit_inode_child() to fix it.

    Fixes: 5195d8e217 ("audit: dynamically allocate audit_names when not enough space is in the names array")
    Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
2023-12-08 13:36:08 -03:00
Jan Stancek 278f2334e8 Merge: fanotify: Allow user space to pass back additional audit info
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2138

The Fanotify API can be used for access control by requesting permission
event notification. The user space tooling that uses it may have a
complicated policy that inherently contains additional context for the
decision. If this information were available in the audit trail, policy
writers can close the loop on debugging policy. Also, if this additional
information were available, it would enable the creation of tools that
can suggest changes to the policy similar to how audit2allow can help
refine labeled security.

This patchset defines a new flag (FAN_INFO) and new extensions that
define additional information which are appended after the response
structure returned from user space on a permission event.  The appended
information is organized with headers containing a type and size that
can be delegated to interested subsystems.  One new information type is
defined to audit the triggering rule number.

A newer kernel will work with an older userspace and an older kernel
will behave as expected and reject a newer userspace, leaving it up to
the newer userspace to test appropriately and adapt as necessary.  This
is done by providing a a fully-formed FAN_INFO extension but setting the
fd to FAN_NOFD.  On a capable kernel, it will succeed but issue no audit
record, whereas on an older kernel it will fail.

The audit function was updated to log the additional information in the
AUDIT_FANOTIFY record. The following are examples of the new record
format:
  type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5
  type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2

Upstream Status: 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008229
Link: https://lore.kernel.org/all/cover.1675373475.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

aa96f591d5fe (Richard Guy Briggs)
   fanotify,audit: Allow audit to use the full permission event response

94fae45ae359 (Richard Guy Briggs)
   fanotify: define struct members to hold response decision context

3611930a76b1 (Richard Guy Briggs)
   fanotify: Ensure consistent variable type for response

 fs/notify/fanotify/fanotify.c      |  8 +++-
 fs/notify/fanotify/fanotify.h      |  6 ++-
 fs/notify/fanotify/fanotify_user.c | 88 ++++++++++++++++++++++++++++----------
 include/linux/audit.h              |  9 ++--
 include/linux/fanotify.h           |  5 +++
 include/uapi/linux/fanotify.h      | 30 ++++++++++++-
 kernel/auditsc.c                   | 18 ++++++--
 7 files changed, 131 insertions(+), 33 deletions(-)

Approved-by: Ondrej Mosnáček <omosnacek@gmail.com>
Approved-by: Ricardo Robaina <rrobaina@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-06-22 13:25:14 +02:00
Jeff Moyer 3ba2380b9a audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068237

commit f482aa98652795846cc55da98ebe331eb74f3d0b
Author: Peilin Ye <peilin.ye@bytedance.com>
Date:   Wed Aug 3 15:23:43 2022 -0700

    audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
    
    Currently @audit_context is allocated twice for io_uring workers:
    
      1. copy_process() calls audit_alloc();
      2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which
         is effectively audit_alloc()) and overwrites @audit_context,
         causing:
    
      BUG: memory leak
      unreferenced object 0xffff888144547400 (size 1024):
    <...>
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8135cfc3>] audit_alloc+0x133/0x210
          [<ffffffff81239e63>] copy_process+0xcd3/0x2340
          [<ffffffff8123b5f3>] create_io_thread+0x63/0x90
          [<ffffffff81686604>] create_io_worker+0xb4/0x230
          [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0
          [<ffffffff8167663a>] io_queue_iowq+0xba/0x200
          [<ffffffff816768b3>] io_queue_async+0x113/0x180
          [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0
          [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120
          [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570
          [<ffffffff81272c4e>] task_work_run+0x7e/0xc0
          [<ffffffff8125a688>] get_signal+0xc18/0xf10
          [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730
          [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180
          [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20
          [<ffffffff844a7e80>] do_syscall_64+0x40/0x80
    
    Then,
    
      3. io_sq_thread() or io_wqe_worker() frees @audit_context using
         audit_free();
      4. do_exit() eventually calls audit_free() again, which is okay
         because audit_free() does a NULL check.
    
    As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and
    redundant audit_free() calls.
    
    Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
    Suggested-by: Paul Moore <paul@paul-moore.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-04-29 08:37:02 -04:00
Richard Guy Briggs 76b9fe1fd4 fanotify,audit: Allow audit to use the full permission event response
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008229
Upstream Status: 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

commit 032bffd494e3924cc8b854b696ef9b5b7396b883
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Fri Feb 3 16:35:16 2023 -0500

    fanotify,audit: Allow audit to use the full permission event response

    This patch passes the full response so that the audit function can use all
    of it. The audit function was updated to log the additional information in
    the AUDIT_FANOTIFY record.

    Currently the only type of fanotify info that is defined is an audit
    rule number, but convert it to hex encoding to future-proof the field.
    Hex encoding suggested by Paul Moore <paul@paul-moore.com>.

    The {subj,obj}_trust values are {0,1,2}, corresponding to no, yes, unknown.

    Sample records:
      type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5
      type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2

    Suggested-by: Steve Grubb <sgrubb@redhat.com>
    Link: https://lore.kernel.org/r/3075502.aeNJFYEL58@x2
    Tested-by: Steve Grubb <sgrubb@redhat.com>
    Acked-by: Steve Grubb <sgrubb@redhat.com>
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Message-Id: <bcb6d552e517b8751ece153e516d8b073459069c.1675373475.git.rgb@redhat.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2023-03-07 00:21:16 -05:00
Richard Guy Briggs c1a7b4ebb9 fanotify: Ensure consistent variable type for response
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008229
Upstream Status: 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

commit 2e0a547164b1384a87fd3500a01297222b0971b0
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Fri Feb 3 16:35:14 2023 -0500

    fanotify: Ensure consistent variable type for response

    The user space API for the response variable is __u32. This patch makes
    sure that the whole path through the kernel uses u32 so that there is
    no sign extension or truncation of the user space response.

    Suggested-by: Steve Grubb <sgrubb@redhat.com>
    Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Paul Moore <paul@paul-moore.com>
    Tested-by: Steve Grubb <sgrubb@redhat.com>
    Acked-by: Steve Grubb <sgrubb@redhat.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2023-03-07 00:21:16 -05:00
Richard Guy Briggs 9bde453463 audit: unify audit_filter_{uring(), inode_name(), syscall()}
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: next-20221020

commit 50979953c0c41e929e5f955800da68e1bb24c7ab
Author: Ankur Arora <ankur.a.arora@oracle.com>
Date:   Thu Oct 6 17:49:43 2022 -0700

    audit: unify audit_filter_{uring(), inode_name(), syscall()}

    audit_filter_uring(), audit_filter_inode_name() are substantially
    similar to audit_filter_syscall(). Move the core logic to
    __audit_filter_op() which can be parametrized for all three.

    On a Skylakex system, getpid() latency (all results aggregated
    across 12 boot cycles):

             Min     Mean    Median   Max      pstdev
             (ns)    (ns)    (ns)     (ns)

     -    196.63   207.86  206.60  230.98      (+- 3.92%)
     +    183.73   196.95  192.31  232.49      (+- 6.04%)

    Performance counter stats for 'bin/getpid' (3 runs) go from:
        cycles               805.58  (  +-  4.11% )
        instructions        1654.11  (  +-   .05% )
        IPC                    2.06  (  +-  3.39% )
        branches             430.02  (  +-   .05% )
        branch-misses          1.55  (  +-  7.09% )
        L1-dcache-loads      440.01  (  +-   .09% )
        L1-dcache-load-misses  9.05  (  +- 74.03% )
    to:
        cycles               765.37  (  +-  6.66% )
        instructions        1677.07  (  +-  0.04% )
        IPC                    2.20  (  +-  5.90% )
        branches             431.10  (  +-  0.04% )
        branch-misses          1.60  (  +- 11.25% )
        L1-dcache-loads      521.04  (  +-  0.05% )
        L1-dcache-load-misses  6.92  (  +- 77.60% )

    (Both aggregated over 12 boot cycles.)

    The increased L1-dcache-loads are due to some intermediate values now
    coming from the stack.

    The improvement in cycles is due to a slightly denser loop (the list
    parameter in the list_for_each_entry_rcu() exit check now comes from
    a register rather than a constant as before.)

    Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:11:36 -04:00
Richard Guy Briggs 3465f638f9 audit: cache ctx->major in audit_filter_syscall()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: next-20221020

commit 069545997510833281f45f83e097017b9fef19b7
Author: Ankur Arora <ankur.a.arora@oracle.com>
Date:   Tue Sep 27 15:59:42 2022 -0700

    audit: cache ctx->major in audit_filter_syscall()

    ctx->major contains the current syscall number. This is, of course, a
    constant for the duration of the syscall. Unfortunately, GCC's alias
    analysis cannot prove that it is not modified via a pointer in the
    audit_filter_syscall() loop, and so always loads it from memory.

    In and of itself the load isn't very expensive (ops dependent on the
    ctx->major load are only used to determine the direction of control flow
    and have short dependence chains and, in any case the related branches
    get predicted perfectly in the fastpath) but still cache ctx->major
    in a local for two reasons:

    * ctx->major is in the first cacheline of struct audit_context and has
      similar alignment as audit_entry::list audit_entry. For cases
      with a lot of audit rules, doing this reduces one source of contention
      from a potentially busy cache-set.

    * audit_in_mask() (called in the hot loop in audit_filter_syscall())
      does cast manipulation and error checking on ctx->major:

         audit_in_mask(const struct audit_krule *rule, unsigned long val):
                 if (val > 0xffffffff)
                         return false;

                 word = AUDIT_WORD(val);
                 if (word >= AUDIT_BITMASK_SIZE)
                         return false;

                 bit = AUDIT_BIT(val);

                 return rule->mask[word] & bit;

      The clauses related to the rule need to be evaluated in the loop, but
      the rest is unnecessarily re-evaluated for every loop iteration.
      (Note, however, that most of these are cheap ALU ops and the branches
       are perfectly predicted. However, see discussion on cycles
       improvement below for more on why it is still worth hoisting.)

    On a Skylakex system change in getpid() latency (aggregated over
    12 boot cycles):

                 Min     Mean  Median     Max       pstdev
                (ns)     (ns)    (ns)    (ns)

     -        201.30   216.14  216.22  228.46      (+- 1.45%)
     +        196.63   207.86  206.60  230.98      (+- 3.92%)

    Performance counter stats for 'bin/getpid' (3 runs) go from:
        cycles               836.89  (  +-   .80% )
        instructions        2000.19  (  +-   .03% )
        IPC                    2.39  (  +-   .83% )
        branches             430.14  (  +-   .03% )
        branch-misses          1.48  (  +-  3.37% )
        L1-dcache-loads      471.11  (  +-   .05% )
        L1-dcache-load-misses  7.62  (  +- 46.98% )

     to:
        cycles               805.58  (  +-  4.11% )
        instructions        1654.11  (  +-   .05% )
        IPC                    2.06  (  +-  3.39% )
        branches             430.02  (  +-   .05% )
        branch-misses          1.55  (  +-  7.09% )
        L1-dcache-loads      440.01  (  +-   .09% )
        L1-dcache-load-misses  9.05  (  +- 74.03% )

    (Both aggregated over 12 boot cycles.)

    instructions: we reduce around 8 instructions/iteration because some of
    the computation is now hoisted out of the loop (branch count does not
    change because GCC, for reasons unclear, only hoists the computations
    while keeping the basic-blocks.)

    cycles: improve by about 5% (in aggregate and looking at individual run
    numbers.) This is likely because we now waste fewer pipeline resources
    on unnecessary instructions which allows the control flow to
    speculatively execute further ahead shortening the execution of the loop
    a little. The final gating factor on the performance of this loop
    remains the long dependence chain due to the linked-list load.

    Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:11:14 -04:00
Richard Guy Briggs ed0e1b54ec audit: free audit_proctitle only on task exit
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: v6.1-rc1

commit c3f3ea8af44d0c5fba79fe8b198087342d0c7e04
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Thu Aug 25 15:32:39 2022 -0400

    audit: free audit_proctitle only on task exit

    Since audit_proctitle is generated at syscall exit time, its value is
    used immediately and cached for the next syscall.  Since this is the
    case, then only clear it at task exit time.  Otherwise, there is no
    point in caching the value OR bearing the overhead of regenerating it.

    Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:04:23 -04:00
Richard Guy Briggs d507f48194 audit: explicitly check audit_context->context enum value
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: v6.1-rc1

commit 3ed66951f952ed8f1a5d03e171722bf2631e8d58
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Thu Aug 25 15:32:38 2022 -0400

    audit: explicitly check audit_context->context enum value

    Be explicit in checking the struct audit_context "context" member enum
    value rather than assuming the order of context enum values.

    Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:03:16 -04:00
Richard Guy Briggs ba408e14ca audit: audit_context pid unused, context enum comment fix
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: v6.1-rc1

commit e84d9f5214cb854fcd584aa78b5634794604d306
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Thu Aug 25 15:32:37 2022 -0400

    audit: audit_context pid unused, context enum comment fix

    The pid member of struct audit_context is never used.  Remove it.

    The audit_reset_context() comment about unconditionally resetting
    "ctx->state" should read "ctx->context".

    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:02:02 -04:00
Richard Guy Briggs 7db73785fa audit: fix repeated words in comments
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: v6.1-rc1

commit 0351dc57b95b8b56f2a467122c13b6b16e0dc53f
Author: Jilin Yuan <yuanjilin@cdjrlc.com>
Date:   Sun Aug 14 17:39:41 2022 +0800

    audit: fix repeated words in comments

    Delete the redundant word 'doesn't'.

    Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
    [PM: subject line tweak]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 16:01:22 -04:00
Richard Guy Briggs 18404800e3 audit: move audit_return_fixup before the filters
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2123857
Upstream Status: v6.0-rc3

commit d4fefa4801a1c2f9c0c7a48fbb0fdf384e89a4ab
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Thu Aug 25 15:32:40 2022 -0400

    audit: move audit_return_fixup before the filters

    The success and return_code are needed by the filters.  Move
    audit_return_fixup() before the filters.  This was causing syscall
    auditing events to be missed.

    Link: https://github.com/linux-audit/audit-kernel/issues/138
    Cc: stable@vger.kernel.org
    Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    [PM: manual merge required]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-11-04 15:58:12 -04:00
Richard Guy Briggs f72d76ac14 audit: free module name
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2100261
Upstream Status: v5.19-rc3

commit ef79c396c664be99d0c5660dc75fe863c1e20315
Author: Christian Göttsche <cgzones@googlemail.com>
Date:   Wed Jun 15 17:44:31 2022 +0200

    audit: free module name

    Reset the type of the record last as the helper `audit_free_module()`
    depends on it.

        unreferenced object 0xffff888153b707f0 (size 16):
          comm "modprobe", pid 1319, jiffies 4295110033 (age 1083.016s)
          hex dump (first 16 bytes):
            62 69 6e 66 6d 74 5f 6d 69 73 63 00 6b 6b 6b a5  binfmt_misc.kkk.
          backtrace:
            [<ffffffffa07dbf9b>] kstrdup+0x2b/0x50
            [<ffffffffa04b0a9d>] __audit_log_kern_module+0x4d/0xf0
            [<ffffffffa03b6664>] load_module+0x9d4/0x2e10
            [<ffffffffa03b8f44>] __do_sys_finit_module+0x114/0x1b0
            [<ffffffffa1f47124>] do_syscall_64+0x34/0x80
            [<ffffffffa200007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

    Cc: stable@vger.kernel.org
    Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
    Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-06-23 16:37:06 -04:00
Richard Guy Briggs d3716e7bae audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2100261
Upstream Status: v5.18

commit 69e9cd66ae1392437234a63a3a1d60b6655f92ef
Author: Julian Orth <ju.orth@gmail.com>
Date:   Tue May 17 12:32:53 2022 +0200

    audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts

    Not calling the function for dummy contexts will cause the context to
    not be reset. During the next syscall, this will cause an error in
    __audit_syscall_entry:

            WARN_ON(context->context != AUDIT_CTX_UNUSED);
            WARN_ON(context->name_count);
            if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
                    audit_panic("unrecoverable error in audit_syscall_entry()");
                    return;
            }

    These problematic dummy contexts are created via the following call
    chain:

           exit_to_user_mode_prepare
        -> arch_do_signal_or_restart
        -> get_signal
        -> task_work_run
        -> tctx_task_work
        -> io_req_task_submit
        -> io_issue_sqe
        -> audit_uring_entry

    Cc: stable@vger.kernel.org
    Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
    Signed-off-by: Julian Orth <ju.orth@gmail.com>
    [PM: subject line tweaks]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-06-23 16:32:40 -04:00
Ondrej Mosnacek 89c0c153f1
lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083580

commit 6326948f940dc3f77066d5cdc44ba6afe67830c0
Author: Paul Moore <paul@paul-moore.com>
Date:   Wed Sep 29 11:01:21 2021 -0400

    lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()

    The security_task_getsecid_subj() LSM hook invites misuse by allowing
    callers to specify a task even though the hook is only safe when the
    current task is referenced.  Fix this by removing the task_struct
    argument to the hook, requiring LSM implementations to use the
    current task.  While we are changing the hook declaration we also
    rename the function to security_current_getsecid_subj() in an effort
    to reinforce that the hook captures the subjective credentials of the
    current task and not an arbitrary task on the system.

    Reviewed-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
2022-05-10 14:21:26 +02:00
Richard Guy Briggs 6b527e0b58 audit: log AUDIT_TIME_* records only from rules
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.18-rc1

commit 272ceeaea355214b301530e262a0df8600bfca95
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Tue Feb 22 11:44:51 2022 -0500

    audit: log AUDIT_TIME_* records only from rules

    AUDIT_TIME_* events are generated when there are syscall rules present
    that are not related to time keeping.  This will produce noisy log
    entries that could flood the logs and hide events we really care about.

    Rather than immediately produce the AUDIT_TIME_* records, store the data
    in the context and log it at syscall exit time respecting the filter
    rules.

    Note: This eats the audit_buffer, unlike any others in show_special().

    Please see https://bugzilla.redhat.com/show_bug.cgi?id=1991919

    Fixes: 7e8eda734d ("ntp: Audit NTP parameters adjustment")
    Fixes: 2d87a0674b ("timekeeping: Audit clock adjustments")
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    [PM: fixed style/whitespace issues]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:03:44 -04:00
Richard Guy Briggs 830115dda4 audit: don't deref the syscall args when checking the openat2 open_how::flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.17-rc4

commit 7a82f89de92aac5a244d3735b2bd162c1147620c
Author: Paul Moore <paul@paul-moore.com>
Date:   Wed Feb 9 14:49:38 2022 -0500

    audit: don't deref the syscall args when checking the openat2 open_how::flags

    As reported by Jeff, dereferencing the openat2 syscall argument in
    audit_match_perm() to obtain the open_how::flags can result in an
    oops/page-fault.  This patch fixes this by using the open_how struct
    that we store in the audit_context with audit_openat2_how().

    Independent of this patch, Richard Guy Briggs posted a similar patch
    to the audit mailing list roughly 40 minutes after this patch was
    posted.

    Cc: stable@vger.kernel.org
    Fixes: 1c30e3af8a79 ("audit: add support for the openat2 syscall")
    Reported-by: Jeff Mahoney <jeffm@suse.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:57 -04:00
Richard Guy Briggs 8387d7375b audit: return early if the filter rule has a lower priority
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit d9516f346e8b8e9c7dd37976a06a5bde1a871d6f
Author: Gaosheng Cui <cuigaosheng1@huawei.com>
Date:   Sat Oct 16 15:23:51 2021 +0800

    audit: return early if the filter rule has a lower priority

    It is not necessary for audit_filter_rules() functions to check
    audit fileds of the rule with a lower priority, and if we did,
    there might be some unintended effects, such as the ctx->ppid
    may be changed unexpectedly, so return early if the rule has
    a lower priority.

    Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
    [PM: slight tweak to the subject line]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:51 -04:00
Richard Guy Briggs 865ba9ed53 audit: add OPENAT2 record to list "how" info
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1
Conflicts:
  - include/uapi/linux/audit.h AUDIT_URINGOP/AUDIT_OPENAT2, went in via selinux tree
    commit 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")

commit 571e5c0efcb29c5dac8cf2949d3eed84ec43056c
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Wed May 19 16:00:22 2021 -0400

    audit: add OPENAT2 record to list "how" info

    Since the openat2(2) syscall uses a struct open_how pointer to communicate
    its parameters they are not usefully recorded by the audit SYSCALL record's
    four existing arguments.

    Add a new audit record type OPENAT2 that reports the parameters in its
    third argument, struct open_how with fields oflag, mode and resolve.

    The new record in the context of an event would look like:
    time->Wed Mar 17 16:28:53 2021
    type=PROCTITLE msg=audit(1616012933.531:184): proctitle=
      73797363616C6C735F66696C652F6F70656E617432002F746D702F61756469742D
      7465737473756974652D737641440066696C652D6F70656E617432
    type=PATH msg=audit(1616012933.531:184): item=1 name="file-openat2"
      inode=29 dev=00:1f mode=0100600 ouid=0 ogid=0 rdev=00:00
      obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
      cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
    type=PATH msg=audit(1616012933.531:184):
      item=0 name="/root/rgb/git/audit-testsuite/tests"
      inode=25 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00
      obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT
      cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
    type=CWD msg=audit(1616012933.531:184):
      cwd="/root/rgb/git/audit-testsuite/tests"
    type=OPENAT2 msg=audit(1616012933.531:184):
      oflag=0100302 mode=0600 resolve=0xa
    type=SYSCALL msg=audit(1616012933.531:184): arch=c000003e syscall=437
      success=yes exit=4 a0=3 a1=7ffe315f1c53 a2=7ffe315f1550 a3=18
      items=2 ppid=528 pid=540 auid=0 uid=0 gid=0 euid=0 suid=0
      fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="openat2"
      exe="/root/rgb/git/audit-testsuite/tests/syscalls_file/openat2"
      subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
      key="testsuite-1616012933-bjAUcEPO"

    Link: https://lore.kernel.org/r/d23fbb89186754487850367224b060e26f9b7181.1621363275.git.rgb@redhat.com
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    [PM: tweak subject, wrap example, move AUDIT_OPENAT2 to 1337]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:50 -04:00
Richard Guy Briggs e0afbfa912 audit: add support for the openat2 syscall
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit 1c30e3af8a79260cdba833a719209b01e6b92300
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Wed May 19 16:00:21 2021 -0400

    audit: add support for the openat2 syscall

    The openat2(2) syscall was added in kernel v5.6 with commit
    fddb5d430a ("open: introduce openat2(2) syscall").

    Add the openat2(2) syscall to the audit syscall classifier.

    Link: https://github.com/linux-audit/audit-kernel/issues/67
    Link: https://lore.kernel.org/r/f5f1a4d8699613f8c02ce762807228c841c2e26f.1621363275.git.rgb@redhat.com
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    [PM: merge fuzz due to previous header rename, commit line wraps]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:50 -04:00
Richard Guy Briggs acd4a06c1c audit: replace magic audit syscall class numbers with macros
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit 42f355ef59a2f98fa4affb4265d3ba3e2d86baf1
Author: Richard Guy Briggs <rgb@redhat.com>
Date:   Wed May 19 16:00:20 2021 -0400

    audit: replace magic audit syscall class numbers with macros

    Replace audit syscall class magic numbers with macros.

    This required putting the macros into new header file
    include/linux/audit_arch.h since the syscall macros were
    included for both 64 bit and 32 bit in any compat code, causing
    redefinition warnings.

    Link: https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git.rgb@redhat.com
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    [PM: renamed header to audit_arch.h after consulting with Richard]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:50 -04:00
Richard Guy Briggs ab313a2996 audit: Convert to SPDX identifier
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit d680c6b49c5edb532e3e5a134d9f48f000a691e1
Author: Cai Huoqing <caihuoqing@baidu.com>
Date:   Tue Sep 14 11:33:38 2021 +0800

    audit: Convert to SPDX identifier

    Use SPDX-License-Identifier instead of a verbose license text.

    Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:49 -04:00
Richard Guy Briggs f5f2b28bc8 audit: add filtering for io_uring records
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit 67daf270cebcf7aab4b3292b36f9adf357b23ddc
Author: Paul Moore <paul@paul-moore.com>
Date:   Sun Apr 18 21:54:47 2021 -0400

    audit: add filtering for io_uring records

    This patch adds basic audit io_uring filtering, using as much of the
    existing audit filtering infrastructure as possible.  In order to do
    this we reuse the audit filter rule's syscall mask for the io_uring
    operation and we create a new filter for io_uring operations as
    AUDIT_FILTER_URING_EXIT/audit_filter_list[7].

    Thanks to Richard Guy Briggs for his review, feedback, and work on
    the corresponding audit userspace changes.

    Acked-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:49 -04:00
Richard Guy Briggs 17d165fb2b audit,io_uring,io-wq: add some basic audit support to io_uring
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit 5bd2182d58e9d9c6279b7a8a2f9b41add0e7f9cb
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Feb 16 19:46:48 2021 -0500

    audit,io_uring,io-wq: add some basic audit support to io_uring

    This patch adds basic auditing to io_uring operations, regardless of
    their context.  This is accomplished by allocating audit_context
    structures for the io-wq worker and io_uring SQPOLL kernel threads
    as well as explicitly auditing the io_uring operations in
    io_issue_sqe().  Individual io_uring operations can bypass auditing
    through the "audit_skip" field in the struct io_op_def definition for
    the operation; although great care must be taken so that security
    relevant io_uring operations do not bypass auditing; please contact
    the audit mailing list (see the MAINTAINERS file) with any questions.

    The io_uring operations are audited using a new AUDIT_URINGOP record,
    an example is shown below:

      type=UNKNOWN[1336] msg=audit(1631800225.981:37289):
        uring_op=19 success=yes exit=0 items=0 ppid=15454 pid=15681
        uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
        subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
        key=(null)

    Thanks to Richard Guy Briggs for review and feedback.

    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:49 -04:00
Richard Guy Briggs 2f1e334a5d audit: prepare audit_context for use in calling contexts beyond syscalls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.16-rc1

commit 12c5e81d3fd0a690c49dfe1c3a99bf80a24075c7
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Feb 16 19:46:48 2021 -0500

    audit: prepare audit_context for use in calling contexts beyond syscalls

    This patch cleans up some of our audit_context handling by
    abstracting out the reset and return code fixup handling to dedicated
    functions.  Not only does this help make things easier to read and
    inspect, it allows for easier reuse by future patches.  We also
    convert the simple audit_context->in_syscall flag into an enum which
    can be used to by future patches to indicate a calling context other
    than the syscall context.

    Thanks to Richard Guy Briggs for review and feedback.

    Acked-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:48 -04:00
Richard Guy Briggs 69e32a03a0 audit: fix possible null-pointer dereference in audit_filter_rules
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2035124
Upstream Status: v5.15-rc7

commit 6e3ee990c90494561921c756481d0e2125d8b895
Author: Gaosheng Cui <cuigaosheng1@huawei.com>
Date:   Sat Oct 16 15:23:50 2021 +0800

    audit: fix possible null-pointer dereference in audit_filter_rules

    Fix  possible null-pointer dereference in audit_filter_rules.

    audit_filter_rules() error: we previously assumed 'ctx' could be null

    Cc: stable@vger.kernel.org
    Fixes: bf361231c2 ("audit: add saddr_fam filter field")
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2022-03-25 20:02:48 -04:00
Zhen Lei 6ddb568008 audit: remove trailing spaces and tabs
Run the following command to find and remove the trailing spaces and tabs:

sed -r -i 's/[ \t]+$//' <audit_files>

The files to be checked are as follows:
kernel/audit*
include/linux/audit.h
include/uapi/linux/audit.h

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-06-10 20:59:05 -04:00
Sergey Nazarov 619ed58ac4 audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
AUDIT_DISABLED defined in kernel/audit.h as element of enum audit_state
and redefined in kernel/audit.c. This produces a warning when kernel builds
with syscalls audit disabled and brokes kernel build if -Werror used.
enum audit_state used in syscall audit code only. This patch changes
enum audit_state constants prefix AUDIT to AUDIT_STATE to avoid
AUDIT_DISABLED redefinition.

Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-06-08 22:05:24 -04:00
Roni Nevalainen 254c8b96c4 audit: add blank line after variable declarations
Fix the following checkpatch warning in auditsc.c:

WARNING: Missing a blank line after declarations

Signed-off-by: Roni Nevalainen <kitten@kittenz.dev>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-05-10 18:41:13 -04:00
Linus Torvalds e359bce39d audit/stable-5.13 PR 20210426
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM4YUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOncA/+OnDdkYFD2e/6PsHURsQ9XK3Yk0kc
 1PY7lJnT4Eb4cUeMe2DP9LpTkA0ldhCxxbz8HYJNn7TUADqeCGhkShBLs/Fxz6k0
 F63RLupJFU0NKhBOYOyccqwmkzc19Ortcj27mYrIgYGK+tSPuRHzJ25PGmjnvh1W
 U7Or0sb1aOegxqFkTXi9IP2wY2Dv+YWfWkSdZNi/W5z4bedCQr9fJgGyUvsDCJyY
 YBIRa/VOLoU9AGkS/XN+uM06lckImC5gqZAqRtJEAk4vj7MsxcWp/eNkENiyaPeH
 +vSUrsv1bj0Bv85CMY8SWGY/GDaiDKjEf+3fVMHF5B/Ft3CgCheykbGPyjRqt3eT
 iIkv0PR5f2MclV5WB5n3gxwE42rPV+FOE8Mh8vRiDdkub/T8r0/cK0FJYPuwYWyA
 r/wdNKQpQUky+laMQWXKpi4tDx6JSWZPBPLG0I8Za/m1CV964VVok68VIMSmBcFj
 sbzYD6e3z9VTnuuxvLiS5HqFTtKkN5VG2al3HmBvZFtkF60xeNs4zbgHV4dg7adK
 clcBE3X4j0RHmYwLs4WWdOzWMPgx99BFJxVgZw3YGXv4oXguLUDFAswTIrc5FNtf
 YYs0/zsPn6CLt15Q7m/3Ec1T0fDf0A+DW3V3KSRNvLaMB41+E1XIWPYpUbfrr13v
 zGfT3CIdu9IR36Q=
 =SzqM
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Another small pull request for audit, most of the patches are
  documentation updates with only two real code changes: one to fix a
  compiler warning for a dummy function/macro, and one to cleanup some
  code since we removed the AUDIT_FILTER_ENTRY ages ago (v4.17)"

* tag 'audit-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: drop /proc/PID/loginuid documentation Format field
  audit: avoid -Wempty-body warning
  audit: document /proc/PID/sessionid
  audit: document /proc/PID/loginuid
  MAINTAINERS: update audit files
  audit: further cleanup of AUDIT_FILTER_ENTRY deprecation
2021-04-27 13:50:58 -07:00
Paul Moore 4ebd7651bf lsm: separate security_task_getsecid() into subjective and objective variants
Of the three LSMs that implement the security_task_getsecid() LSM
hook, all three LSMs provide the task's objective security
credentials.  This turns out to be unfortunate as most of the hook's
callers seem to expect the task's subjective credentials, although
a small handful of callers do correctly expect the objective
credentials.

This patch is the first step towards fixing the problem: it splits
the existing security_task_getsecid() hook into two variants, one
for the subjective creds, one for the objective creds.

  void security_task_getsecid_subj(struct task_struct *p,
				   u32 *secid);
  void security_task_getsecid_obj(struct task_struct *p,
				  u32 *secid);

While this patch does fix all of the callers to use the correct
variant, in order to keep this patch focused on the callers and to
ease review, the LSMs continue to use the same implementation for
both hooks.  The net effect is that this patch should not change
the behavior of the kernel in any way, it will be up to the latter
LSM specific patches in this series to change the hook
implementations and return the correct credentials.

Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 15:23:32 -04:00
Richard Guy Briggs 5504a69a42 audit: further cleanup of AUDIT_FILTER_ENTRY deprecation
Remove the list parameter from the function call since the exit filter
list is the only remaining list used by this function.

This cleans up commit 5260ecc2e0
("audit: deprecate the AUDIT_FILTER_ENTRY filter")

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-12 16:30:23 -05:00
Linus Torvalds 7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Yang Yang 127c8c5f05 audit: Make audit_filter_syscall() return void
No invoker uses the return value of audit_filter_syscall().
So make it return void, and amend the comment of
audit_filter_syscall().

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
[PM: removed the changelog from the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-27 21:55:14 -05:00
Christian Brauner 71bc356f93
commoncap: handle idmapped mounts
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Linus Torvalds 3d5de2ddc6 audit/stable-5.11 PR 20201214
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl/YBw4UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNndg/+JyEYzO+B0y0+0iTeUmBgLB1Hsbvt
 2RlQe8sZo3nBLr96hty4jwRUNdudUSKwKxXjIEr9DplNTpMd3/DzIMb92b00vVIi
 kBMDawsgtrAmWBE99Jo8YtL2vKbr5e5XlCjD1iH4UdfPvHemusMzGSMfzSetAgLU
 JTe0vzgdE46Y4peELTOGeCosO3WC2j4QU6B1QW4rFQEUr9AlN3c2Q40JEPUCKPCU
 3cLRWPQTmr9yiKis1i5HD7mHKqseSgvlxnl1SCboWSEJVbdfg+ceK4ugI7gXbweL
 EXxBDFJxuQEk5ENPu6MUZDgbcy7ROXMpE1TyFx8+SHxQJSmNiylddg/dZMbUk9Cs
 dLNkWMQbol827XdhcbXun5KVRGzh4sTwDL9QnxCfPtxpjGuYdQmXUTFnePgLVBH3
 Ial4mTGOOd37m6a7peAPtnjgR4W1jugoZQMSp//bOKTQvaZlDnWnoPGhgJENDELs
 Ys+tpsam+CjvoPzGfMRF/DQhk4QZtMhlFyd5H+6EeBh8K6WJepXTg+fMpBgXAKat
 Cy1YS5O0vKE+y2J0SKds/Gd7skTREN2QiYdVWo7LX8Vp8hWI9ClZiJHBO3QOQGI3
 2hJBPTzZ4qex6F2kSX6O17MFd/eOBLhTf+V+X5JjlE/YPQyYXxGvlSbCW0tVVyzW
 xFgeevnwl1aOlPU=
 =J+S/
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "A small set of audit patches for v5.11 with four patches in total and
  only one of any real significance.

  Richard's patch to trigger accompanying records causes the kernel to
  emit additional related records when an audit event occurs; helping
  provide some much needed context to events in the audit log. It is
  also worth mentioning that this is a revised patch based on an earlier
  attempt that had to be reverted in the v5.8 time frame.

  Everything passes our test suite, and with no problems reported please
  merge this for v5.11"

* tag 'audit-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: replace atomic_add_return()
  audit: fix macros warnings
  audit: trigger accompanying records when no rules present
  audit: fix a kernel-doc markup
2020-12-16 10:54:03 -08:00
Alex Shi ba59eae723 audit: fix macros warnings
Some unused macros could cause gcc warning:
kernel/audit.c:68:0: warning: macro "AUDIT_UNINITIALIZED" is not used
[-Wunused-macros]
kernel/auditsc.c:104:0: warning: macro "AUDIT_AUX_IPCPERM" is not used
[-Wunused-macros]
kernel/auditsc.c:82:0: warning: macro "AUDITSC_INVALID" is not used
[-Wunused-macros]

AUDIT_UNINITIALIZED and AUDITSC_INVALID are still meaningful and should
be in incorporated.

Just remove AUDIT_AUX_IPCPERM.

Thanks comments from Richard Guy Briggs and Paul Moore.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-24 23:28:02 -05:00
Gabriel Krisman Bertazi 785dc4eb7f audit: Migrate to use SYSCALL_WORK flag
On architectures using the generic syscall entry code the architecture
independent syscall work is moved to flags in thread_info::syscall_work.
This removes architecture dependencies and frees up TIF bits.

Define SYSCALL_WORK_SYSCALL_AUDIT, use it in the generic entry code and
convert the code which uses the TIF specific helper functions to use the
new *_syscall_work() helpers which either resolve to the new mode for users
of the generic entry code or to the TIF based functions for the other
architectures.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20201116174206.2639648-9-krisman@collabora.com
2020-11-16 21:53:16 +01:00
Richard Guy Briggs 6d915476e6 audit: trigger accompanying records when no rules present
When there are no audit rules registered, mandatory records (config,
etc.) are missing their accompanying records (syscall, proctitle, etc.).

This is due to audit context dummy set on syscall entry based on absence
of rules that signals that no other records are to be printed.  Clear the dummy
bit if any record is generated, open coding this in audit_log_start().

The proctitle context and dummy checks are pointless since the
proctitle record will not be printed if no syscall records are printed.

The fds array is reset to -1 after the first syscall to indicate it
isn't valid any more, but was never set to -1 when the context was
allocated to indicate it wasn't yet valid.

Check ctx->pwd in audit_log_name().

The audit_inode* functions can be called without going through
getname_flags() or getname_kernel() that sets audit_names and cwd, so
set the cwd in audit_alloc_name() if it has not already been done so due to
audit_names being valid and purge all other audit_getcwd() calls.

Revert the LSM dump_common_audit_data() LSM_AUDIT_DATA_* cases from the
ghak96 patch since they are no longer necessary due to cwd coverage in
audit_alloc_name().

Thanks to bauen1 <j2468h@googlemail.com> for reporting LSM situations in
which context->cwd is not valid, inadvertantly fixed by the ghak96 patch.

Please see upstream github issue
https://github.com/linux-audit/audit-kernel/issues/120
This is also related to upstream github issue
https://github.com/linux-audit/audit-kernel/issues/96

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-10-27 21:02:57 -04:00
Linus Torvalds fd76a74d94 audit/stable-5.9 PR 20200803
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8okpIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNqOQ/8D+m9Ykcby3csEKsp8YtsaukEu62U
 lRVaxzRNO9wwB24aFwDFuJnIkmsSi/s/O4nBsy2mw+Apn+uDCvHQ9tBU07vlNn2f
 lu27YaTya7YGlqoe315xijd8tyoX99k8cpQeixvAVr9/jdR09yka7SJ8O7X9mjV7
 +SUVDiKCplPKpiwCCRS9cqD7F64T6y35XKzbrzYqdP0UOF2XelZo/Evt5rDRvWUf
 5qDN2tP+iM/Fvu5lCfczFwAeivfAdxjQ11n783hx8Ms2qyiaKQCzbEwjqAslmkbs
 1k/+ED0NjzXX1ne0JZaz/bk0wsMnmOoa8o+NDcyd7Za/cj5prUZi7kBy+xry4YV8
 qKJ40Lk0flCWgUpm6bkYVOByIYHk0gmfBNvjilqf25NR/eOC/9e9ir8PywvYUW/7
 kvVK37+N/a3LnFj80sZpIeqqnNU8z9PV1i7//5/kDuKvz94Bq83TJDO6pPKvqDtC
 njQfCFoHwdEeF8OalK793lIiYaoODqvbkWKChKMqziODJ4ZP8AW06gXpEbEWn7G3
 TTnJx7hqzR9t90vBQJeO3Fromfn+9TDlZVdX+EGO8gIqUiLGr0r7LPPep4VkDbNw
 LxMYKeC2cgRp8Z+XXPDxfXSDL2psTwg6CXcDrXcYnUyBo/yerpBvbJkeaR0h+UR0
 j6cvMX+T39X2JXM=
 =Xs3M
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Aside from some smaller bug fixes, here are the highlights:

   - add a new backlog wait metric to the audit status message, this is
     intended to help admins determine how long processes have been
     waiting for the audit backlog queue to clear

   - generate audit records for nftables configuration changes

   - generate CWD audit records for for the relevant LSM audit records"

* tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: report audit wait metric in audit status reply
  audit: purge audit_log_string from the intra-kernel audit API
  audit: issue CWD record to accompany LSM_AUDIT_DATA_* records
  audit: use the proper gfp flags in the audit_log_nfcfg() calls
  audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs
  audit: add gfp parameter to audit_log_nfcfg
  audit: log nftables configuration change events
  audit: Use struct_size() helper in alloc_chunk
2020-08-04 14:20:26 -07:00
Paul Moore 8ac68dc455 revert: 1320a4052e ("audit: trigger accompanying records when no rules present")
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records".  Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.

We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.

Cc: stable@vger.kernel.org
Fixes: 1320a4052e ("audit: trigger accompanying records when no rules present")
Reported-by: j2468h@googlemail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-29 10:00:36 -04:00
Richard Guy Briggs d7481b24b8 audit: issue CWD record to accompany LSM_AUDIT_DATA_* records
The LSM_AUDIT_DATA_* records for PATH, FILE, IOCTL_OP, DENTRY and INODE
are incomplete without the task context of the AUDIT Current Working
Directory record.  Add it.

This record addition can't use audit_dummy_context to determine whether
or not to store the record information since the LSM_AUDIT_DATA_*
records are initiated by various LSMs independent of any audit rules.
context->in_syscall is used to determine if it was called in user
context like audit_getname.

Please see the upstream issue
https://github.com/linux-audit/audit-kernel/issues/96

Adapted from Vladis Dronov's v2 patch.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-08 19:02:11 -04:00
Richard Guy Briggs 142240398e audit: add gfp parameter to audit_log_nfcfg
Fixed an inconsistent use of GFP flags in nft_obj_notify() that used
GFP_KERNEL when a GFP flag was passed in to that function.  Given this
allocated memory was then used in audit_log_nfcfg() it led to an audit
of all other GFP allocations in net/netfilter/nf_tables_api.c and a
modification of audit_log_nfcfg() to accept a GFP parameter.

Reported-by: Dan Carptenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-29 19:14:47 -04:00
Richard Guy Briggs 8e6cf365e1 audit: log nftables configuration change events
iptables, ip6tables, arptables and ebtables table registration,
replacement and unregistration configuration events are logged for the
native (legacy) iptables setsockopt api, but not for the
nftables netlink api which is used by the nft-variant of iptables in
addition to nftables itself.

Add calls to log the configuration actions in the nftables netlink api.

This uses the same NETFILTER_CFG record format but overloads the table
field.

  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=?:0;?:0 family=unspecified entries=2 op=nft_register_gen pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
  ...
  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=firewalld:1;?:0 family=inet entries=0 op=nft_register_table pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
  ...
  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=8 op=nft_register_chain pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
  ...
  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=101 op=nft_register_rule pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
  ...
  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=87 op=nft_register_setelem pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
  ...
  type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=0 op=nft_register_set pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

For further information please see issue
https://github.com/linux-audit/audit-kernel/issues/124

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-23 20:25:16 -04:00