Commit Graph

92 Commits

Author SHA1 Message Date
Baoquan He 82325d2bf9 kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 6dacd79d28842ff01f18b4900d897741aac5999e
Author: Petr Tesarik <ptesarik@suse.com>
Date:   Mon Aug 5 17:07:50 2024 +0200

    kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y

    Fix the condition to exclude the elfcorehdr segment from the SHA digest
    calculation.

    The j iterator is an index into the output sha_regions[] array, not into
    the input image->segment[] array.  Once it reaches
    image->elfcorehdr_index, all subsequent segments are excluded.  Besides,
    if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
    may be wrongly included in the calculation.

    Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
    Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
    Signed-off-by: Petr Tesarik <ptesarik@suse.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
    Cc: Eric DeVolder <eric_devolder@yahoo.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:36 +08:00
Baoquan He 930e56cdd6 crash: add a new kexec flag for hotplug support
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 79365026f86948b52c3cb7bf099dded92c559b4c
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date:   Tue Mar 26 11:24:09 2024 +0530

    crash: add a new kexec flag for hotplug support

    Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
    introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
    this flag to indicate to the kernel that it is safe to modify the
    elfcorehdr of the kdump image loaded using the kexec_load system call.

    However, it is possible that architectures may need to update kexec
    segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
    on PowerPC. Introducing a new kexec flag for every new kexec segment
    may not be a good solution. Hence, a generic kexec flag bit,
    `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
    hotplug support intent between the kexec tool and the kernel for the
    kexec_load system call.

    Now we have two kexec flags that enables crash hotplug support for
    kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in
    x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures).

    To simplify the process of finding and reporting the crash hotplug
    support the following changes are introduced.

    1. Define arch specific function to process the kexec flags and
       determine crash hotplug support

    2. Rename the @update_elfcorehdr member of struct kimage to
       @hotplug_support and populate it for both kexec_load and
       kexec_file_load syscalls, because architecture can update more than
       one kexec segment

    3. Let generic function crash_check_hotplug_support report hotplug
       support for loaded kdump image based on value of @hotplug_support

    To bring the x86 crash hotplug support in line with the above points,
    the following changes have been made:

    - Introduce the arch_crash_hotplug_support function to process kexec
      flags and determine crash hotplug support

    - Remove the arch_crash_hotplug_[cpu|memory]_support functions

    Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240326055413.186534-3-sourabhjain@linux.ibm.com

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:36 +08:00
Baoquan He 68c40c9012 arm64, crash: wrap crash dumping code into crash related ifdefs
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 40254101d87870b2e5ac3ddc28af40aa04c48486
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Jan 24 13:12:47 2024 +0800

    arm64, crash: wrap crash dumping code into crash related ifdefs

    Now crash codes under kernel/ folder has been split out from kexec
    code, crash dumping can be separated from kexec reboot in config
    items on arm64 with some adjustments.

    Here wrap up crash dumping codes with CONFIG_CRASH_DUMP ifdeffery.

    [bhe@redhat.com: fix building error in generic codes]
      Link: https://lkml.kernel.org/r/20240129135033.157195-2-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20240124051254.67105-8-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Pingfan Liu <piliu@redhat.com>
    Cc: Klara Modin <klarasmodin@gmail.com>
    Cc: Michael Kelley <mhklinux@outlook.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Yang Li <yang.lee@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:35 +08:00
Baoquan He 6ddb054bd6 crash: split crash dumping code out from kexec_core.c
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflicts: There's conflict in last hunk of include/linux/kexec.h
           because of the fuzz caused by earlier back ported commits
           related to commit f4af41bf177a ("kexec: fix the unexpected
           kexec_dprintk() macro").

commit 02aff8480533817a29e820729360866441d7403d
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Jan 24 13:12:44 2024 +0800

    crash: split crash dumping code out from kexec_core.c

    Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes
    need be built in to avoid compiling error when building kexec code even
    though the crash dumping functionality is not enabled. E.g
    --------------------
    CONFIG_CRASH_CORE=y
    CONFIG_KEXEC_CORE=y
    CONFIG_KEXEC=y
    CONFIG_KEXEC_FILE=y
    ---------------------

    After splitting out crashkernel reservation code and vmcoreinfo exporting
    code, there's only crash related code left in kernel/crash_core.c. Now
    move crash related codes from kexec_core.c to crash_core.c and only build it
    in when CONFIG_CRASH_DUMP=y.

    And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope,
    or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP
    ifdef in generic kernel files.

    With these changes, crash_core codes are abstracted from kexec codes and
    can be disabled at all if only kexec reboot feature is wanted.

    Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Pingfan Liu <piliu@redhat.com>
    Cc: Klara Modin <klarasmodin@gmail.com>
    Cc: Michael Kelley <mhklinux@outlook.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Yang Li <yang.lee@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:35 +08:00
Baoquan He 94daa84e5a kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down()
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 18d565ea95fe553f442c5bbc5050415bab3c3fa4
Author: Yuntao Wang <ytcoode@gmail.com>
Date:   Sun Dec 17 11:35:27 2023 +0800

    kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down()

    temp_end represents the address of the last available byte.  Therefore,
    the starting address of the memory segment with temp_end as its last
    available byte and a size of `kbuf->memsz`, that is, the value of
    temp_start, should be `temp_end - kbuf->memsz + 1` instead of `temp_end -
    kbuf->memsz`.

    Additionally, use the ALIGN_DOWN macro instead of open-coding it directly
    in locate_mem_hole_top_down() to improve code readability.

    Link: https://lkml.kernel.org/r/20231217033528.303333-3-ytcoode@gmail.com
    Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He a21d007c99 kexec_file: load kernel at top of system RAM if required
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit b3ba234171cd0d58df0a13c262210ff8b5fd2830
Author: Baoquan He <bhe@redhat.com>
Date:   Tue Nov 14 17:16:58 2023 +0800

    kexec_file: load kernel at top of system RAM if required

    Patch series "kexec_file: Load kernel at top of system RAM if required".

    Justification:
    ==============

    Kexec_load interface has been doing top down searching and loading
    kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
    benefits are that it avoids to consume and fragment limited low memory
    which satisfy DMA buffer allocation and big chunk of continuous memory
    during system init; and avoids to stir with BIOS/FW reserved or occupied
    areas, or corner case handling/work around/quirk occupied areas when doing
    system init.  By the way, the top-down searching and loading of kexec-ed
    kernel is done in user space utility code.

    For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
    ignored.  It calls walk_system_ram_res() directly to go through all
    resources of System RAM bottom up, to find an available memory region,
    then call locate_mem_hole_callback() to allocate memory in that found
    memory region from top to down.  This is not expected and inconsistent
    with kexec_load.

    Implementation
    ===============

    In patch 1, introduce a new function walk_system_ram_res_rev() which is a
    variant of walk_system_ram_res(), it walks through a list of all the
    resources of System RAM in reversed order, i.e., from higher to lower.

    In patch 2, check if kexec_buf.top_down is 'true' in
    kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find
    memory region of system RAM from top to down to load kernel/initrd etc.

    Background information: ======================= And I ever tried this in
    the past in a different way, please see below link.  In the post, I tried
    to adjust struct sibling linking code, replace the the singly linked list
    with list_head so that walk_system_ram_res_rev() can be implemented in a
    much easier way.  Finally I failed.
    https://lore.kernel.org/all/20180718024944.577-4-bhe@redhat.com/

    This time, I picked up the patch from AKASHI Takahiro's old post and made
    some change to take as the current patch 1:
    https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html

    This patch (of 2):

    Kexec_load interface has been doing top down searching and loading
    kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
    benefits are that it avoids to consume and fragment limited low memory
    which satisfy DMA buffer allocation and big chunk of continuous memory
    during system init; and avoids to stir with BIOS/FW reserved or occupied
    areas, or corner case handling/work around/quirk occupied areas when doing
    system init.  By the way, the top-down searching and loading of kexec-ed
    kernel is done in user space utility code.

    For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
    ignored.  It calls walk_system_ram_res() directly to go through all
    resources of System RAM bottom up, to find an available memory region,
    then call locate_mem_hole_callback() to allocate memory in that found
    memory region from top to down.  This is not expected and inconsistent
    with kexec_load.

    Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if
    yes, call the newly added walk_system_ram_res_rev() to find memory region
    of system RAM from top to down to load kernel/initrd etc.

    Link: https://lkml.kernel.org/r/20231114091658.228030-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20231114091658.228030-3-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He c0a56c0010 kexec: exclude elfcorehdr from the segment digest
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit f7cc804a9fd404e77a0c8b329443eae99f35ab67
Author: Eric DeVolder <eric.devolder@oracle.com>
Date:   Mon Aug 14 17:44:41 2023 -0400

    kexec: exclude elfcorehdr from the segment digest

    When a crash kernel is loaded via the kexec_file_load() syscall, the
    kernel places the various segments (ie crash kernel, crash initrd,
    boot_params, elfcorehdr, purgatory, etc) in memory.  For those
    architectures that utilize purgatory, a hash digest of the segments is
    calculated for integrity checking.  The digest is embedded into the
    purgatory image prior to placing in memory.

    Updates to the elfcorehdr in response to CPU and memory changes would
    cause the purgatory integrity checking to fail (at crash time, and no
    vmcore created).  Therefore, the elfcorehdr segment is explicitly excluded
    from the purgatory digest, enabling updates to the elfcorehdr while also
    avoiding the need to recompute the hash digest and reload purgatory.

    Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.com
    Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
    Suggested-by: Baoquan He <bhe@redhat.com>
    Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Akhil Raj <lf32.dev@gmail.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Mimi Zohar <zohar@linux.ibm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Weißschuh <linux@weissschuh.net>
    Cc: Valentin Schneider <vschneid@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He 536f874a42 crash: move a few code bits to setup support of crash hotplug
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 6f991cc363a3269866476b8ff10a112768d3d45c
Author: Eric DeVolder <eric.devolder@oracle.com>
Date:   Mon Aug 14 17:44:39 2023 -0400

    crash: move a few code bits to setup support of crash hotplug

    Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.

    Once the kdump service is loaded, if changes to CPUs or memory occur,
    either by hot un/plug or off/onlining, the crash elfcorehdr must also be
    updated.

    The elfcorehdr describes to kdump the CPUs and memory in the system, and
    any inaccuracies can result in a vmcore with missing CPU context or memory
    regions.

    The current solution utilizes udev to initiate an unload-then-reload of
    the kdump image (eg.  kernel, initrd, boot_params, purgatory and
    elfcorehdr) by the userspace kexec utility.  In the original post I
    outlined the significant performance problems related to offloading this
    activity to userspace.

    This patchset introduces a generic crash handler that registers with the
    CPU and memory notifiers.  Upon CPU or memory changes, from either hot
    un/plug or off/onlining, this generic handler is invoked and performs
    important housekeeping, for example obtaining the appropriate lock, and
    then invokes an architecture specific handler to do the appropriate
    elfcorehdr update.

    Note the description in patch 'crash: change crash_prepare_elf64_headers()
    to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
    enables further optimizations related to CPU plug/unplug/online/offline
    performance of elfcorehdr updates.

    In the case of x86_64, the arch specific handler generates a new
    elfcorehdr, and overwrites the old one in memory; thus no involvement with
    userspace needed.

    To realize the benefits/test this patchset, one must make a couple
    of minor changes to userspace:

     - Prevent udev from updating kdump crash kernel on hot un/plug changes.
       Add the following as the first lines to the RHEL udev rule file
       /usr/lib/udev/rules.d/98-kexec.rules:

       # The kernel updates the crash elfcorehdr for CPU and memory changes
       SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
       SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

       With this changeset applied, the two rules evaluate to false for
       CPU and memory change events and thus skip the userspace
       unload-then-reload of kdump.

     - Change to the kexec_file_load for loading the kdump kernel:
       Eg. on RHEL: in /usr/bin/kdumpctl, change to:
        standard_kexec_args="-p -d -s"
       which adds the -s to select kexec_file_load() syscall.

    This kernel patchset also supports kexec_load() with a modified kexec
    userspace utility.  A working changeset to the kexec userspace utility is
    posted to the kexec-tools mailing list here:

     http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

    To use the kexec-tools patch, apply, build and install kexec-tools, then
    change the kdumpctl's standard_kexec_args to replace the -s with
    --hotplug.  The removal of -s reverts to the kexec_load syscall and the
    addition of --hotplug invokes the changes put forth in the kexec-tools
    patch.

    This patch (of 8):

    The crash hotplug support leans on the work for the kexec_file_load()
    syscall.  To also support the kexec_load() syscall, a few bits of code
    need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
    moved out of kexec_file.c and into a common location crash_core.c.

    In addition, struct crash_mem and crash_notes were moved to new locales so
    that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.

    No functionality change intended.

    Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
    Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com
    Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
    Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Akhil Raj <lf32.dev@gmail.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Mimi Zohar <zohar@linux.ibm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Weißschuh <linux@weissschuh.net>
    Cc: Valentin Schneider <vschneid@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He 661d1099d2 kexec_lock: Replace kexec_mutex() by kexec_lock() in two comments
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 55e2b69649be38f1788b38755070875b96111d2f
Author: Wenyu Liu <liuwenyu7@huawei.com>
Date:   Mon Aug 7 10:52:06 2023 +0800

    kexec_lock: Replace kexec_mutex() by kexec_lock() in two comments

    kexec_mutex is replaced by an atomic variable
    in 05c6257433b (panic, kexec: make __crash_kexec() NMI safe).

    But there are still two comments that referenced kexec_mutex,
    replace them by kexec_lock.

    Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He e0333b614e kexec: rename ARCH_HAS_KEXEC_PURGATORY
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: Changes related to risc-v are discarded, and the hunk in
          arch/s390/Kbuild need be manually edited because of fuzz.

commit e6265fe7775ec51241850abc854c9652d4709996
Author: Eric DeVolder <eric.devolder@oracle.com>
Date:   Wed Jul 12 12:15:45 2023 -0400

    kexec: rename ARCH_HAS_KEXEC_PURGATORY

    The Kconfig refactor to consolidate KEXEC and CRASH options utilized
    option names of the form ARCH_SUPPORTS_<option>. Thus rename the
    ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow
    the same.

    Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com
    Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He 304ad7cfb8 kexec: support purgatories with .text.hot sections
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 8652d44f466ad5772e7d1756e9457046189b0dfc
Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Fri May 19 16:47:36 2023 +0200

    kexec: support purgatories with .text.hot sections

    Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.

    When upreving llvm I realised that kexec stopped working on my test
    platform.

    The reason seems to be that due to PGO there are multiple .text sections
    on the purgatory, and kexec does not supports that.

    This patch (of 4):

    Clang16 links the purgatory text in two sections when PGO is in use:

      [ 1] .text             PROGBITS         0000000000000000  00000040
           00000000000011a1  0000000000000000  AX       0     0     16
      [ 2] .rela.text        RELA             0000000000000000  00003498
           0000000000000648  0000000000000018   I      24     1     8
      ...
      [17] .text.hot.        PROGBITS         0000000000000000  00003220
           000000000000020b  0000000000000000  AX       0     0     1
      [18] .rela.text.hot.   RELA             0000000000000000  00004428
           0000000000000078  0000000000000018   I      24    17     8

    And both of them have their range [sh_addr ... sh_addr+sh_size] on the
    area pointed by `e_entry`.

    This causes that image->start is calculated twice, once for .text and
    another time for .text.hot. The second calculation leaves image->start
    in a random location.

    Because of this, the system crashes immediately after:

    kexec_core: Starting new kernel

    Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org
    Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org
    Fixes: 930457057a ("kernel/kexec_file.c: split up __kexec_load_puragory")
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Reviewed-by: Ross Zwisler <zwisler@google.com>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmer@rivosinc.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Simon Horman <horms@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tom Rix <trix@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 3e7d7d8788 kexec: avoid calculating array size twice
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 4df3504e2f17cb35d2c12b07e716ae37971d0d52
Author: Simon Horman <horms@kernel.org>
Date:   Thu May 25 16:26:25 2023 +0200

    kexec: avoid calculating array size twice

    Avoid calculating array size twice in kexec_purgatory_setup_sechdrs().
    Once using array_size(), and once open-coded.

    Flagged by Coccinelle:

      .../kexec_file.c:881:8-25: WARNING: array_size is already used (line 877) to compute the same size

    No functional change intended.
    Compile tested only.

    Link: https://lkml.kernel.org/r/20230525-kexec-array_size-v1-1-8b4bf4f7500a@kernel.org
    Signed-off-by: Simon Horman <horms@kernel.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Zhen Lei <thunder.leizhen@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 51d035ece6 kexec: introduce sysctl parameters kexec_load_limit_*
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit a42aaad2e47b23d63037bfc0130e33fc0f74cd71
Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Wed Jan 4 15:38:48 2023 +0100

    kexec: introduce sysctl parameters kexec_load_limit_*

    kexec allows replacing the current kernel with a different one.  This is
    usually a source of concerns for sysadmins that want to harden a system.

    Linux already provides a way to disable loading new kexec kernel via
    kexec_load_disabled, but that control is very coard, it is all or nothing
    and does not make distinction between a panic kexec and a normal kexec.

    This patch introduces new sysctl parameters, with finer tuning to specify
    how many times a kexec kernel can be loaded.  The sysadmin can set
    different limits for kexec panic and kexec reboot kernels.  The value can
    be modified at runtime via sysctl, but only with a stricter value.

    With these new parameters on place, a system with loadpin and verity
    enabled, using the following kernel parameters:
    sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
    good warranty that if initrd tries to load a panic kernel, a malitious
    user will have small chances to replace that kernel with a different one,
    even if they can trigger timeouts on the disk where the panic kernel
    lives.

    Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Bagas Sanjaya <bagasdotme@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Philipp Rudo <prudo@redhat.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He bf22fc9dac kexec: factor out kexec_load_permitted
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 7e99f8b69c11c104933b9bc8fda226ebfb8aaaa5
Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Wed Jan 4 15:38:47 2023 +0100

    kexec: factor out kexec_load_permitted

    Both syscalls (kexec and kexec_file) do the same check, let's factor it
    out.

    Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Bagas Sanjaya <bagasdotme@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Philipp Rudo <prudo@redhat.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 1d34d79f7a kexec: replace crash_mem_range with range
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit cade589fdf697dd4981056c09f83924db8e4e4ed
Author: Li Chen <lchen@ambarella.com>
Date:   Thu Sep 29 12:29:35 2022 +0800

    kexec: replace crash_mem_range with range

    We already have struct range, so just use it.

    Link: https://lkml.kernel.org/r/20220929042936.22012-4-bhe@redhat.com
    Signed-off-by: Li Chen <lchen@ambarella.com>
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Chen Lifu <chenlifu@huawei.com>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Jianglei Nie <niejianglei2021@163.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: ye xingchen <ye.xingchen@zte.com.cn>
    Cc: Zeal Robot <zealci@zte.com.cn>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:31 +08:00
Baoquan He f44db8de6a kexec_file: print out debugging message if required
JIRA: https://issues.redhat.com/browse/RHEL-477

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: There's conflict in kernel/crash_core.c because memory/cpu
          hotplug support on crash hasn't been back ported to rhel9. So
          skip that hunk for now.

commit a85ee18c7900f001f42082d2fabce4eaf57e655f
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Dec 13 13:57:42 2023 +0800

    kexec_file: print out debugging message if required

    Then when specifying '-d' for kexec_file_load interface, loaded locations
    of kernel/initrd/cmdline etc can be printed out to help debug.

    Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file
    loading related codes.

    And also print out type/start/head of kimage and flags to help debug.

    Link: https://lkml.kernel.org/r/20231213055747.61826-3-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Conor Dooley <conor@kernel.org>
    Cc: Joe Perches <joe@perches.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-04-28 21:54:52 +08:00
Baoquan He ba5f45be9a kexec_file: add kexec_file flag to control debug printing
JIRA: https://issues.redhat.com/browse/RHEL-477

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: There are conflict in
          2nd hunk of include/linux/kexec.h
          kernel/kexec_core.c
          because memory/cpu hotplug support on crash is not back
          ported to rhel9 yet.

commit cbc2fe9d9cb226347365753f50d81bc48cc3c52e
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Dec 13 13:57:41 2023 +0800

    kexec_file: add kexec_file flag to control debug printing

    Patch series "kexec_file: print out debugging message if required", v4.

    Currently, specifying '-d' on kexec command will print a lot of debugging
    informationabout kexec/kdump loading with kexec_load interface.

    However, kexec_file_load prints nothing even though '-d' is specified.
    It's very inconvenient to debug or analyze the kexec/kdump loading when
    something wrong happened with kexec/kdump itself or develper want to check
    the kexec/kdump loading.

    In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked
    in code.  If it's passed in, debugging message of kexec_file code will be
    printed out and can be seen from console and dmesg.  Otherwise, the
    debugging message is printed like beofre when pr_debug() is taken.

    Note:
    ****
    =====
    1) The code in kexec-tools utility also need be changed to support
    passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified.
    The patch link is here:
    =========
    [PATCH] kexec_file: add kexec_file flag to support debug printing
    http://lists.infradead.org/pipermail/kexec/2023-November/028505.html

    2) s390 also has kexec_file code, while I am not sure what debugging
    information is necessary. So leave it to s390 developer.

    Test:
    ****
    ====
    Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64
    again. And on x86_64, the printed messages look like below:
    --------------------------------------------------------------
    kexec measurement buffer for the loaded kernel at 0x207fffe000.
    Loaded purgatory at 0x207fff9000
    Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180
    Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000
    Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280
    Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro
    rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M
    E820 memmap:
    0000000000000000-000000000009a3ff (1)
    000000000009a400-000000000009ffff (2)
    00000000000e0000-00000000000fffff (2)
    0000000000100000-000000006ff83fff (1)
    000000006ff84000-000000007ac50fff (2)
    ......
    000000207fff6150-000000207fff615f (128)
    000000207fff6160-000000207fff714f (1)
    000000207fff7150-000000207fff715f (128)
    000000207fff7160-000000207fff814f (1)
    000000207fff8150-000000207fff815f (128)
    000000207fff8160-000000207fffffff (1)
    nr_segments = 5
    segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000
    segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000
    segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000
    segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000
    segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000
    kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8
    ---------------------------------------------------------------------------

    This patch (of 7):

    When specifying 'kexec -c -d', kexec_load interface will print loading
    information, e.g the regions where kernel/initrd/purgatory/cmdline are
    put, the memmap passed to 2nd kernel taken as system RAM ranges, and
    printing all contents of struct kexec_segment, etc.  These are very
    helpful for analyzing or positioning what's happening when kexec/kdump
    itself failed.  The debugging printing for kexec_load interface is made in
    user space utility kexec-tools.

    Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
    Because kexec_file code is mostly implemented in kernel space, and the
    debugging printing functionality is missed.  It's not convenient when
    debugging kexec/kdump loading and jumping with kexec_file_load interface.

    Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
    message printing.  And add global variable kexec_file_dbg_print and macro
    kexec_dprintk() to facilitate the printing.

    This is a preparation, later kexec_dprintk() will be used to replace the
    existing pr_debug().  Once 'kexec -s -d' is specified, it will print out
    kexec/kdump loading information.  If '-d' is not specified, it regresses
    to pr_debug().

    Link: https://lkml.kernel.org/r/20231213055747.61826-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20231213055747.61826-2-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Conor Dooley <conor@kernel.org>
    Cc: Joe Perches <joe@perches.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-04-28 21:52:02 +08:00
Baoquan He a223ce5f56 kexec: remove unnecessary arch_kexec_kernel_image_load()
JIRA: https://issues.redhat.com/browse/RHEL-517
Upstream Status: linux.git

This is back ported from upstream, no conflict.

commit fb15abdca64503511bb32cb6ff70da306f24fa06
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Mar 7 16:44:16 2023 -0600

    kexec: remove unnecessary arch_kexec_kernel_image_load()

    arch_kexec_kernel_image_load() only calls kexec_image_load_default(), and
    there are no arch-specific implementations.

    Remove the unnecessary arch_kexec_kernel_image_load() and make
    kexec_image_load_default() static.

    No functional change intended.

    Link: https://lkml.kernel.org/r/20230307224416.907040-3-helgaas@kernel.org
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
2023-05-24 16:00:21 +08:00
Valentin Schneider 2e6264f252 panic, kexec: make __crash_kexec() NMI safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2166717
Upstream-status: https://github.com/torvalds/linux.git

commit 05c6257433b7212f07a7e53479a8ab038fc1666a
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Thu Jun 30 23:32:58 2022 +0100

    panic, kexec: make __crash_kexec() NMI safe

    Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
    panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
    of mutex_trylock():

	    if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		    return 0;

    This prevents an nmi_panic() from executing the main body of
    __crash_kexec() which does the actual kexec into the kdump kernel.  The
    warning and return are explained by:

      6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
      [...]
      The reasons for this are:

	  1) There is a potential deadlock in the slowpath

	  2) Another cpu which blocks on the rtmutex will boost the task
	     which allegedly locked the rtmutex, but that cannot work
	     because the hard/softirq context borrows the task context.

    Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
    and replace it with an atomic variable.  This is somewhat overzealous as
    *some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
    like crash_shrink_memory()), but this has the benefit of involving a
    single unified lock and preventing any future NMI-related surprises.

    Tested by triggering NMI panics via:

      $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
      $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
      $ echo 1 > /proc/sys/kernel/panic

      $ ipmitool power diag

    Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
    Fixes: 6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Juri Lelli <jlelli@redhat.com>
    Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2023-02-06 09:36:01 +00:00
Baoquan He 6f800036f7 kexec_file: Fix kexec_file.c build error for riscv platform
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict:
  There's conflict in include/linux/kexec.h because the commit commit
  3e35142ef99f ("kexec_file: drop weak attribute from
  arch_kexec_apply_relocations[_add]") has been back ported earlier.

commit 4853f68d158ac59b05985a6af5b7da7ccdbc14c8
Author: Liao Chang <liaochang1@huawei.com>
Date:   Fri Apr 8 18:09:09 2022 +0800

    kexec_file: Fix kexec_file.c build error for riscv platform

    When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of
    kernel/kexec_file.c generate build error:

    kernel/kexec_file.c: In function 'crash_prepare_elf64_headers':
    ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union
      110 |  ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr))
          |                                                                       ^
    ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping'
      131 |  is_linear_mapping(_x) ?       \
          |  ^~~~~~~~~~~~~~~~~
    ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug'
      140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x)
          |                               ^~~~~~~~~~~~~~~~~~
    ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol'
      143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0))
          |                        ^~~~~~~~~~~~~~~~~~
    kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol'
     1327 |   phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);

    This occurs is because the "kernel_map" referenced in macro
    is_linear_mapping()  is suppose to be the one of struct kernel_mapping
    defined in arch/riscv/mm/init.c, but the 2nd argument of
    crash_prepare_elf64_header() has same symbol name, in expansion of macro
    is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map"
    actually is the local variable.

    Signed-off-by: Liao Chang <liaochang1@huawei.com>
    Link: https://lore.kernel.org/r/20220408100914.150110-2-lizhengyu3@huawei.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:04 -05:00
Baoquan He 528ffb0d20 kexec_file: increase maximum file size to 4G
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict: None

commit f4da7afe07523ff8930c4466b09a15db18508cd4
Author: Pasha Tatashin <pasha.tatashin@soleen.com>
Date:   Fri May 27 02:55:35 2022 +0000

    kexec_file: increase maximum file size to 4G

    In some case initrd can be large.  For example, it could be a netboot
    image loaded by u-root, that is kexec'ing into it.

    The maximum size of initrd is arbitrary set to 2G.  Also, the limit is not
    very obvious because it is hidden behind a generic INT_MAX macro.

    Theoretically, we could make it LONG_MAX, but it is safer to keep it sane,
    and just increase it to 4G.

    Increase the size to 4G, and make it obvious by having a new macro that
    specifies the maximum file size supported by kexec_file_load() syscall:
    KEXEC_FILE_SIZE_MAX.

    Link: https://lkml.kernel.org/r/20220527025535.3953665-3-pasha.tatashin@soleen.com
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Baoquan He a0145a6024 ima: force signature verification when CONFIG_KEXEC_SIG is configured
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict: None

commit af16df54b89dee72df253abc5e7b5e8a6d16c11c
Author: Coiby Xu <coxu@redhat.com>
Date:   Wed Jul 13 15:21:11 2022 +0800

    ima: force signature verification when CONFIG_KEXEC_SIG is configured

    Currently, an unsigned kernel could be kexec'ed when IMA arch specific
    policy is configured unless lockdown is enabled. Enforce kernel
    signature verification check in the kexec_file_load syscall when IMA
    arch specific policy is configured.

    Fixes: 99d5cadfde ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
    Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Frantisek Hrbata e9e9bc8da2 Merge: mm changes through v5.18 for 9.2
Merge conflicts:
-----------------
Conflicts with !1142(merged) "io_uring: update to v5.15"

fs/io-wq.c
        - static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
          !1142 already contains backport of 3146cba99aa2 ("io-wq: make worker creation resilient against signals")
          along with other commits which are not present in !1370. Resolved in favor of HEAD(!1142)
        - static int io_wqe_worker(void *data)
          !1370 does not contain 767a65e9f317 ("io-wq: fix potential race of acct->nr_workers")
          Resolved in favor of HEAD(!1142)
        - static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
          HEAD(!1142) does not contain e32cf5dfbe22 ("kthread: Generalize pf_io_worker so it can point to struct kthread")
          Resolved in favor of !1370
        - static void create_worker_cont(struct callback_head *cb)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static void io_workqueue_create(struct work_struct *work)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
          !1370 does not contain 66e70be72288 ("io-wq: fix memory leak in create_io_worker()")
          Resolved in favor of HEAD(!1142)
        - static bool io_wq_work_match_item(struct io_wq_work *work, void *data)
          !1370 does not contain 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
          Resolved in favor of HEAD(!1142)
        - static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
          !1370 is missing 713b9825a4c4 ("io-wq: fix cancellation on create-worker failure")
          removed wrongly merged run_cancel label
          Resolved in favor of HEAD(!1142)
        - static bool io_task_work_match(struct callback_head *cb, void *data)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
          Resolved in favor of HEAD(!1142)
        - static void io_wq_exit_workers(struct io_wq *wq)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
          Resolved in favor of HEAD(!1142)
        - int io_wq_max_workers(struct io_wq *wq, int *new_count)
          !1370 is missing 3b33e3f4a6c0 ("io-wq: fix silly logic error in io_task_work_match()")
fs/io_uring.c
        - static int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
          !1370 is missing bunch of commits after 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
          Resolved in favor of HEAD(!1142)
include/uapi/linux/io_uring.h
        - !1370 is missing dd47c104533d ("io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items")
          just a comment conflict
          Resolved in favor of HEAD(!1142)
kernel/exit.c
        - void __noreturn do_exit(long code)
        - !1370 contains bunch of commits after f552a27afe67 ("io_uring: remove files pointer in cancellation functions")
          Resolved in favor of !1370

Conflicts with !1357(merged) "NFS refresh for RHEL-9.2"

fs/nfs/callback.c
        - nfs4_callback_svc(void *vrqstp)
          !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") where the module_put_and_kthread_exit() was removed
          Resolved in favor of HEAD(!1357)
fs/nfs/file.c
          !1357 is missing 187c82cb0380 ("fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio")
          Resolved in favor of HEAD(!1370)
fs/nfsd/nfssvc.c
        - nfsd(void *vrqstp)
          !1370 is missing f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module")
          Resolved in favor of HEAD(!1357)
-----------------

MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1370

Bugzilla: https://bugzilla.redhat.com/2120352

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2099722

Patches 1-9 are changes to selftests
Patches 10-31 are reverts of RHEL-only patches to address COR CVE
Patches 32-320 are the machine dependent mm changes ported by Rafael
Patch 321 reverts the backport of 6692c98c7df5. See below.
Patches 322-981 are the machine independent mm changes
Patches 982-1016 are David Hildebrand's upstream changes to address the COR CVE

RHEL commit b23c298982 fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMA
which is a backport of upstream 6692c98c7df5 and is reverted early in this series. 6692c98c7df5
is a fix for upstream 40966e316f86 which was not in RHEL until this series. 6692c98c7df5 is re-added
after 40966e316f86.

Omitted-fix: 310d1344e3c5 ("Revert "powerpc: Remove unused FW_FEATURE_NATIVE references"")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 465d0eb0dc31 ("Docs/admin-guide/mm/damon/usage: fix the example code snip")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 317314527d17 ("mm/hugetlb: correct demote page offset logic")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 37dcc673d065 ("frontswap: don't call ->init if no ops are registered")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted-fix: 30c19366636f ("mm: fix BUG splat with kvmalloc + GFP_ATOMIC")
        to be fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2131716

Omitted: fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 009ad9f0c6ee io_uring: drop ctx->uring_lock before acquiring sqd->lock
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: bc369921d670 io-wq: max_worker fixes
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: e139a1ec92f8 io_uring: apply max_workers limit to all future users
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 71c9ce27bb57 io-wq: fix max-workers not correctly set on multi-node system
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 41d3a6bd1d37 io_uring: pin SQPOLL data before unlocking ring lock
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: bad119b9a000 io_uring: honour zeroes as io-wq worker limits
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: 08bdbd39b584 io-wq: ensure that hash wait lock is IRQ disabling
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 713b9825a4c4 io-wq: fix cancellation on create-worker failure
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 3b33e3f4a6c0 io-wq: fix silly logic error in io_task_work_match()
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 71e1cef2d794 io-wq: Remove duplicate code in io_workqueue_create()
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=210774

Omitted-fix: a226abcd5d42 io-wq: don't retry task_work creation failure on fatal conditions
	fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107743

Omitted-fix: fa84693b3c89 io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
        fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: dd47c104533d io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items
        fixed under https://bugzilla.redhat.com/show_bug.cgi?id=2107656

Omitted-fix: 4f0712ccec09 hexagon: Fix function name in die()
	unsupported arch

Omitted-fix: 751971af2e36 csky: Fix function name in csky_alignment() and die()
	unsupported arch

Omitted-fix: dcbc65aac283 ptrace: Remove duplicated include in ptrace.c
        unsupported arch

Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17

Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18

Omitted-fix: eb48d4219879 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 105d2d4832 Merge DRM changes from upstream v5.16..v5.17

Omitted-fix: 751a9d69b197 drm/i915: Fix oops due to missing stack depot
	fixed in RHEL commit 99fc716fc4 Merge DRM changes from upstream v5.17..v5.18

Omitted-fix: b95dc06af3e6 drm/amdgpu: disable runpm if we are the primary adapter
        reverted later

Omitted-fix: 5a90c24ad028 Revert "drm/amdgpu: disable runpm if we are the primary adapter"
        revert of above omitted fix

Omitted-fix: 724bbe49c5e4 fs/ntfs3: provide block_invalidate_folio to fix memory leak
	unsupported fs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Jiri Benc <jbenc@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-10-23 19:49:41 +02:00
Chris von Recklinghausen 880e8c868a memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED
Bugzilla: https://bugzilla.redhat.com/2120352

commit f7892d8e288d4b090176f26d9bf7943dbbb639a6
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Nov 5 13:44:53 2021 -0700

    memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED

    Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED,
    indicating that we're dealing with a memory region that is never
    indicated in the firmware-provided memory map, but always detected and
    added by a driver.

    Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such
    memory regions like ordinary MEMBLOCK_NONE memory regions -- for
    example, when selecting memory regions to add to the vmcore for dumping
    in the crashkernel via for_each_mem_range().

    However, especially kexec_file is not supposed to select such memblocks
    via for_each_free_mem_range() / for_each_free_mem_range_reverse() to
    place kexec images, similar to how we handle
    IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK.

    We'll make sure that memory hotplug code sets the flag where applicable
    (IORESOURCE_SYSRAM_DRIVER_MANAGED) next.  This prepares architectures
    that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem
    support.

    Note that kexec *must not* indicate this memory to the second kernel and
    *must not* place kexec-images on this memory.  Let's add a comment to
    kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED
    now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in
    locate_mem_hole_callback() for kexec_walk_resources().

    Also note that MEMBLOCK_HOTPLUG cannot be reused due to different
    semantics:
            MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the
            firmware-provided memory map and added to the system early during
            boot; kexec *has to* indicate this memory to the second kernel and
            can place kexec-images on this memory. After memory hotunplug,
            kexec has to be re-armed. We mostly ignore this flag when
            "movable_node" is not set on the kernel command line, because
            then we're told to not care about hotunpluggability of such
            memory regions.

            MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in
            the firmware-provided memory map; this memory is always detected
            and added to the system by a driver; memory might not actually be
            physically hotunpluggable. kexec *must not* indicate this memory to
            the second kernel and *must not* place kexec-images on this memory.

    Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
    Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Jianyong Wu <Jianyong.Wu@arm.com>
    Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Shahab Vahedi <shahab@synopsys.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:30 -04:00
Coiby Xu 3cabfd5ac1 kexec, KEYS: make the code in bzImage64_verify_sig generic
Bugzilla: https://bugzilla.redhat.com/2004384

Upstream Status: https://github.com/torvalds/linux.git

commit c903dae8941deb55043ee46ded29e84e97cd84bb
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Jul 14 21:40:25 2022 +0800

    kexec, KEYS: make the code in bzImage64_verify_sig generic

    commit 278311e417 ("kexec, KEYS: Make use of platform keyring for
    signature verify") adds platform keyring support on x86 kexec but not
    arm64.

    The code in bzImage64_verify_sig uses the keys on the
    .builtin_trusted_keys, .machine, if configured and enabled,
    .secondary_trusted_keys, also if configured, and .platform keyrings
    to verify the signed kernel image as PE file.

    Cc: kexec@lists.infradead.org
    Cc: keyrings@vger.kernel.org
    Cc: linux-security-module@vger.kernel.org
    Reviewed-by: Michal Suchanek <msuchanek@suse.de>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-09-19 17:23:58 +08:00
Coiby Xu d84de2a203 kexec: clean up arch_kexec_kernel_verify_sig
Bugzilla: https://bugzilla.redhat.com/2004384

Upstream Status: https://github.com/torvalds/linux.git

commit 689a71493bd2f31c024f8c0395f85a1fd4b2138e
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Jul 14 21:40:24 2022 +0800

    kexec: clean up arch_kexec_kernel_verify_sig

    Before commit 105e10e2cf1c ("kexec_file: drop weak attribute from
    functions"), there was already no arch-specific implementation
    of arch_kexec_kernel_verify_sig. With weak attribute dropped by that
    commit, arch_kexec_kernel_verify_sig is completely useless. So clean it
    up.

    Note later patches are dependent on this patch so it should be backported
    to the stable tree as well.

    Cc: stable@vger.kernel.org
    Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
    Reviewed-by: Michal Suchanek <msuchanek@suse.de>
    Acked-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    [zohar@linux.ibm.com: reworded patch description "Note"]
    Link: https://lore.kernel.org/linux-integrity/20220714134027.394370-1-coxu@redhat.com/
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-09-19 17:23:55 +08:00
Coiby Xu 7b2928e7d4 kexec_file: drop weak attribute from functions
Bugzilla: https://bugzilla.redhat.com/2004384

Upstream Status: https://github.com/torvalds/linux.git

commit 65d9a9a60fd71be964effb2e94747a6acb6e7015
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Fri Jul 1 13:04:04 2022 +0530

    kexec_file: drop weak attribute from functions

    As requested
    (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org),
    this series converts weak functions in kexec to use the #ifdef approach.

    Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from
    arch_kexec_apply_relocations[_add]") changelog:

    : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols")
    : [1], binutils (v2.36+) started dropping section symbols that it thought
    : were unused.  This isn't an issue in general, but with kexec_file.c, gcc
    : is placing kexec_arch_apply_relocations[_add] into a separate
    : .text.unlikely section and the section symbol ".text.unlikely" is being
    : dropped.  Due to this, recordmcount is unable to find a non-weak symbol in
    : .text.unlikely to generate a relocation record against.

    This patch (of 2);

    Drop __weak attribute from functions in kexec_file.c:
    - arch_kexec_kernel_image_probe()
    - arch_kimage_file_post_load_cleanup()
    - arch_kexec_kernel_image_load()
    - arch_kexec_locate_mem_hole()
    - arch_kexec_kernel_verify_sig()

    arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so
    drop the static attribute for the latter.

    arch_kexec_kernel_verify_sig() is not overridden by any architecture, so
    drop the __weak attribute.

    Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
    Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Suggested-by: Eric Biederman <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-09-19 17:23:50 +08:00
Coiby Xu 2f6c008a8a kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
Bugzilla: https://bugzilla.redhat.com/2004384

Upstream Status: https://github.com/torvalds/linux.git

commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Thu May 19 14:42:37 2022 +0530

    kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]

    Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
    symbols") [1], binutils (v2.36+) started dropping section symbols that
    it thought were unused.  This isn't an issue in general, but with
    kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
    separate .text.unlikely section and the section symbol ".text.unlikely"
    is being dropped. Due to this, recordmcount is unable to find a non-weak
    symbol in .text.unlikely to generate a relocation record against.

    Address this by dropping the weak attribute from these functions.
    Instead, follow the existing pattern of having architectures #define the
    name of the function they want to override in their headers.

    [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

    [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h]
    Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-09-19 17:23:47 +08:00
Jia-Ju Bai 31d82c2c78 kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
When vzalloc() returns NULL to sha_regions, no error return code of
kexec_calculate_store_digests() is assigned.  To fix this bug, ret is
assigned with -ENOMEM in this case.

Link: https://lkml.kernel.org/r/20210309083904.24321-1-baijiaju1990@gmail.com
Fixes: a43cac0d9d ("kexec: split kexec_file syscall code to kexec_file.c")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:32 -07:00
Lakshmi Ramasubramanian f31e3386a4 ima: Free IMA measurement buffer after kexec syscall
IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function.  This buffer is not freed before
completing the kexec system call resulting in memory leak.

Add ima_buffer field in "struct kimage" to store the virtual address
of the buffer allocated for the IMA measurement list.
Free the memory allocated for the IMA measurement list in
kimage_file_post_load_cleanup() function.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Fixes: 7b8589cc29 ("ima: on soft reboot, save the measurement list")
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-02-10 15:49:38 -05:00
Eric Biggers a24d22b225 crypto: sha - split sha.h into sha1.h and sha2.h
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure.  So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1.  It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-20 14:45:33 +11:00
David Hildenbrand 7cf603d17d kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
always set to 0 by hardware.  This is far from beautiful (and confusing),
and the bit only applies to SYSRAM.  So let's move it out of the
bus-specific (PnP) defined bits.

We'll add another SYSRAM specific bit soon.  If we ever need more bits for
other purposes, we can steal some from "desc", or reshuffle/regroup what
we have.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Liu <wei.liu@kernel.org>
Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:18 -07:00
Kees Cook 0fa8e08464 fs/kernel_file_read: Add "offset" arg for partial reads
To perform partial reads, callers of kernel_read_file*() must have a
non-NULL file_size argument and a preallocated buffer. The new "offset"
argument can then be used to seek to specific locations in the file to
fill the buffer to, at most, "buf_size" per call.

Where possible, the LSM hooks can report whether a full file has been
read or not so that the contents can be reasoned about.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:37:04 +02:00
Kees Cook 885352881f fs/kernel_read_file: Add file_size output argument
In preparation for adding partial read support, add an optional output
argument to kernel_read_file*() that reports the file size so callers
can reason more easily about their reading progress.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:37:03 +02:00
Kees Cook f7a4f689bc fs/kernel_read_file: Remove redundant size argument
In preparation for refactoring kernel_read_file*(), remove the redundant
"size" argument which is not needed: it can be included in the return
code, with callers adjusted. (VFS reads already cannot be larger than
INT_MAX.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:34:18 +02:00
Scott Branden b89999d004 fs/kernel_read_file: Split into separate include file
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
include file. That header gets pulled in just about everywhere
and doesn't really need functions not related to the general fs interface.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:34:18 +02:00
Linus Torvalds 50f6c7dbd9 Misc fixes and small updates all around the place:
- Fix mitigation state sysfs output
  - Fix an FPU xstate/sxave code assumption bug triggered by Architectural LBR support
  - Fix Lightning Mountain SoC TSC frequency enumeration bug
  - Fix kexec debug output
  - Fix kexec memory range assumption bug
  - Fix a boundary condition in the crash kernel code
 
  - Optimize porgatory.ro generation a bit
  - Enable ACRN guests to use X2APIC mode
  - Reduce a __text_poke() IRQs-off critical section for the benefit of PREEMPT_RT
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl83ybgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iJnQ/+OAkE5hiQ+F1ikQ4rKyjaT6FjvynReNUA
 ysQjcCypGB4x+slR8o3k5yrzYJ9WbDfOz7a0uekZtNHvJ80+3yheV5Yvf+Uz3EYM
 Jj/OubCNMNnvS5cJMNXs196SGd/ELLWBbCjwUWPsiWJ0ZMTgKmpZz1LgB1QZjhyw
 fbAc1WgTLVO+emE5FwBrmFzvgBxn5EtiFoLhegFtACHadNcJLiKpXpiK3NKkEirO
 owF1/Qg6mn6MowKDBDkWgmwi0HVYbraqu0hXRrCq9o105CVwgwUdORTwjK3rnUNs
 et10Zz2UmSpjXJOhKZdZLFCtYOmrADmS4pnoXF6W6cLLFvkq4b2ducnlFBtNKqMh
 ljPkIT04sF99gIKijEYWsru+MgS4qO1VNHtJxkr/ZCUjqahsa1nN9F0lP0QOXjwf
 hbK4h1NrML3UiCGAe2hjIh9zY2c8s2Q90PyCvZkKNKquSQ1E011hzcEE2RIoBBYB
 mc1d6lgfCFWVkbgRA5sx1CVtgnAvHk2wu9w/8N9XTGjPgiQJRr3I8cNUZw59gaMH
 43auWyvpVAA4vdfbKJrPVrTLhTTnQYv0A966l7/i0d8MkGN4u09sAiB3ZevZMEK9
 45b7IXWluCi0ikBAmCvQ+qEzhg7pApCziVKuaZ/4j+qPLTDAutGwz7YuaXyOKrUX
 Aj/uCev6D6c=
 =fvpv
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes and small updates all around the place:

   - Fix mitigation state sysfs output

   - Fix an FPU xstate/sxave code assumption bug triggered by
     Architectural LBR support

   - Fix Lightning Mountain SoC TSC frequency enumeration bug

   - Fix kexec debug output

   - Fix kexec memory range assumption bug

   - Fix a boundary condition in the crash kernel code

   - Optimize porgatory.ro generation a bit

   - Enable ACRN guests to use X2APIC mode

   - Reduce a __text_poke() IRQs-off critical section for the benefit of
     PREEMPT_RT"

* tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Acquire pte lock with interrupts enabled
  x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
  x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
  x86/purgatory: Don't generate debug info for purgatory.ro
  x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
  kexec_file: Correctly output debugging information for the PT_LOAD ELF header
  kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
  x86/crash: Correct the address boundary of function parameters
  x86/acrn: Remove redundant chars from ACRN signature
  x86/acrn: Allow ACRN guest to use X2APIC mode
2020-08-15 10:38:03 -07:00
Linus Torvalds 25d8d4eeca powerpc updates for 5.9
- Add support for (optionally) using queued spinlocks & rwlocks.
 
  - Support for a new faster system call ABI using the scv instruction on Power9
    or later.
 
  - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
    Power10 and future processors, leaving us with no way to implement the
    functionality it requests. This risks breaking userspace, though we believe
    it is unused in practice.
 
  - A bug fix for, and then the removal of, our custom stack expansion checking.
    We now allow stack expansion up to the rlimit, like other architectures.
 
  - Remove the remnants of our (previously disabled) topology update code, which
    tried to react to NUMA layout changes on virtualised systems, but was prone
    to crashes and other problems.
 
  - Add PMU support for Power10 CPUs.
 
  - A change to our signal trampoline so that we don't unbalance the link stack
    (branch return predictor) in the signal delivery path.
 
  - Lots of other cleanups, refactorings, smaller features and so on as usual.
 
 Thanks to:
   Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
   Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
   Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
   Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
   Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
   Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
   Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
   Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
   Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
   RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
   Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
   Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
   O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
   Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
   Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
   Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
   Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
   Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
   YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
 BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
 L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
 2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
 Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
 ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
 nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
 EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
 KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
 KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
 npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
 j3GifvPq6Efp3Q==
 =QMY1
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for (optionally) using queued spinlocks & rwlocks.

 - Support for a new faster system call ABI using the scv instruction on
   Power9 or later.

 - Drop support for the PROT_SAO mmap/mprotect flag as it will be
   unsupported on Power10 and future processors, leaving us with no way
   to implement the functionality it requests. This risks breaking
   userspace, though we believe it is unused in practice.

 - A bug fix for, and then the removal of, our custom stack expansion
   checking. We now allow stack expansion up to the rlimit, like other
   architectures.

 - Remove the remnants of our (previously disabled) topology update
   code, which tried to react to NUMA layout changes on virtualised
   systems, but was prone to crashes and other problems.

 - Add PMU support for Power10 CPUs.

 - A change to our signal trampoline so that we don't unbalance the link
   stack (branch return predictor) in the signal delivery path.

 - Lots of other cleanups, refactorings, smaller features and so on as
   usual.

Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.

* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
  selftests/powerpc: Fix pkey syscall redefinitions
  powerpc: Fix circular dependency between percpu.h and mmu.h
  powerpc/powernv/sriov: Fix use of uninitialised variable
  selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
  powerpc/40x: Fix assembler warning about r0
  powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  cpuidle: pseries: Fixup exit latency for CEDE(0)
  cpuidle: pseries: Add function to parse extended CEDE records
  cpuidle: pseries: Set the latency-hint before entering CEDE
  selftests/powerpc: Fix online CPU selection
  powerpc/perf: Consolidate perf_callchain_user_[64|32]()
  powerpc/pseries/hotplug-cpu: Remove double free in error path
  powerpc/pseries/mobility: Add pr_debug() for device tree changes
  powerpc/pseries/mobility: Set pr_fmt()
  powerpc/cacheinfo: Warn if cache object chain becomes unordered
  powerpc/cacheinfo: Improve diagnostics about malformed cache lists
  powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
  powerpc/cacheinfo: Set pr_fmt()
  powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
  ...
2020-08-07 10:33:50 -07:00
Lianbo Jiang 475f63ae63 kexec_file: Correctly output debugging information for the PT_LOAD ELF header
Currently, when we enable the debugging switch to debug kexec_file,
we always get the following incorrect results:

  kexec_file: Crash PT_LOAD elf header. phdr=00000000c988639b vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=51 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=000000003cca69a0 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=52 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000c584cb9f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=53 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000cf85d57f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=54 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000a4a8f847 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=55 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000272ec49f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=56 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000ea0b65de vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=57 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=000000001f5e490c vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=58 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000dfe4109e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=59 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000480ed2b6 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=60 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=0000000080b65151 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=61 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=0000000024e31c5e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=62 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000332e0385 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=63 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=000000002754d5da vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=64 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=00000000783320dd vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=65 p_offset=0x0
  kexec_file: Crash PT_LOAD elf header. phdr=0000000076fe5b64 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=66 p_offset=0x0

The reason is that kernel always prints the values of the next PT_LOAD
instead of the current PT_LOAD. Change it to ensure that we can get the
correct debugging information.

[ mingo: Amended changelog, capitalized "ELF". ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/r/20200804044933.1973-4-lijiang@redhat.com
2020-08-07 01:32:00 +02:00
Lianbo Jiang a2e9a95d21 kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
The crash_exclude_mem_range() function can only handle one memory region a time.

It will fail in the case in which the passed in area covers several memory
regions. In this case, it will only exclude the first region, then return,
but leave the later regions unsolved.

E.g in a NEC system with two usable RAM regions inside the low 1M:

  ...
  BIOS-e820: [mem 0x0000000000000000-0x000000000003efff] usable
  BIOS-e820: [mem 0x000000000003f000-0x000000000003ffff] reserved
  BIOS-e820: [mem 0x0000000000040000-0x000000000009ffff] usable

It will only exclude the memory region [0, 0x3efff], the memory region
[0x40000, 0x9ffff] will still be added into /proc/vmcore, which may cause
the following failure when dumping vmcore:

 ioremap on RAM at 0x0000000000040000 - 0x0000000000040fff
 WARNING: CPU: 0 PID: 665 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2c7/0x2e0
 ...
 RIP: 0010:__ioremap_caller+0x2c7/0x2e0
 ...
 cp: error reading '/proc/vmcore': Cannot allocate memory
 kdump: saving vmcore failed

In order to fix this bug, let's extend the crash_exclude_mem_range()
to handle the overlapping ranges.

[ mingo: Amended the changelog. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/r/20200804044933.1973-3-lijiang@redhat.com
2020-08-07 01:32:00 +02:00
Linus Torvalds 4cec929370 integrity-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEjSMCCC7+cjo3nszSa3kkZrA+cVoFAl8puJgUHHpvaGFyQGxp
 bnV4LmlibS5jb20ACgkQa3kkZrA+cVq47w//VDg2pTD+/fPadleRJkKVSPaKJu4k
 N/gAVPxhYpJVJ+BTZKMFzTjX3kjfQG7udjORzC+saEdii7W1EfJJqHabLEnihfxd
 VDUS0RQndMwOkioAAZOsy5dFE84wUOX8O1kq31Aw2G+QLCYhn1dNMg10j6SBM034
 cJbS59k3w+lyqFy/Fje8e7aO1xmc/83x9MfLgzZTscCZqzf1vIJY8onwfTxRVBpQ
 QS0AZJM+b0+9MlJxpzBYxZARwYb5cXBLh07W/vBFmJRh15n0e20uWM4YFkBixicX
 gi3LtXd/75hFIHgm6QqbwDJrrA45zOJs5YsOudCctWVAe5k5mV0H7ysJ6phcRI9E
 uQvBb7Z+0viQXis6Cjx4gYSYAcAJPcDrfcjR4itQSOj5anUFBvCju+Jr373S0Vn8
 3eXGyimRAc33vEFkI7RJNfExkGh7pkYWzcruk90bHD6dAKuki/tisIs7ZvhTuFOp
 eyWt7hbctqbt/gESop3zXjUDRJsX9GyAA4OvJwFGRfRJ4ziQ5w8LGc+VendSWald
 1zjkJxXAZLjDPQlYv2074PYeIguTbcDkjeRVxUD9mWvdi0tyXK+r2qC+PeX7Rs71
 y1aGIT/NX9qYI2H0xIm3ettztdIE8F1tnAn2ziNkQiXEzCrEqKtAAxxSErTQuB78
 LMgCDPF8y06ZjD8=
 =M/tq
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "The nicest change is the IMA policy rule checking. The other changes
  include allowing the kexec boot cmdline line measure policy rules to
  be defined in terms of the inode associated with the kexec kernel
  image, making the IMA_APPRAISE_BOOTPARAM, which governs the IMA
  appraise mode (log, fix, enforce), a runtime decision based on the
  secure boot mode of the system, and including errno in the audit log"

* tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: remove redundant initialization of variable ret
  ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
  ima: AppArmor satisfies the audit rule requirements
  ima: Rename internal filter rule functions
  ima: Support additional conditionals in the KEXEC_CMDLINE hook function
  ima: Use the common function to detect LSM conditionals in a rule
  ima: Move comprehensive rule validation checks out of the token parser
  ima: Use correct type for the args_p member of ima_rule_entry.lsm elements
  ima: Shallow copy the args_p member of ima_rule_entry.lsm elements
  ima: Fail rule parsing when appraise_flag=blacklist is unsupportable
  ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond
  ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond
  ima: Fail rule parsing when buffer hook functions have an invalid action
  ima: Free the entire rule if it fails to parse
  ima: Free the entire rule when deleting a list of rules
  ima: Have the LSM free its audit rule
  IMA: Add audit log for failure conditions
  integrity: Add errno field in audit message
2020-08-06 11:35:57 -07:00
Hari Bathini f891f19736 kexec_file: Allow archs to handle special regions while locating memory hole
Some architectures may have special memory regions, within the given
memory range, which can't be used for the buffer in a kexec segment.
Implement weak arch_kexec_locate_mem_hole() definition which arch code
may override, to take care of special regions, while trying to locate
a memory hole.

Also, add the missing declarations for arch overridable functions and
and drop the __weak descriptors in the declarations to avoid non-weak
definitions from becoming weak.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/159602273603.575379.17665852963340380839.stgit@hbathini
2020-07-29 23:47:53 +10:00
Tyler Hicks 4834177e63 ima: Support additional conditionals in the KEXEC_CMDLINE hook function
Take the properties of the kexec kernel's inode and the current task
ownership into consideration when matching a KEXEC_CMDLINE operation to
the rules in the IMA policy. This allows for some uniformity when
writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
and KEXEC_CMDLINE operations.

Prior to this patch, it was not possible to write a set of rules like
this:

 dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
 dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
 dont_measure func=KEXEC_CMDLINE obj_type=foo_t
 measure func=KEXEC_KERNEL_CHECK
 measure func=KEXEC_INITRAMFS_CHECK
 measure func=KEXEC_CMDLINE

The inode information associated with the kernel being loaded by a
kexec_kernel_load(2) syscall can now be included in the decision to
measure or not

Additonally, the uid, euid, and subj_* conditionals can also now be
used in KEXEC_CMDLINE rules. There was no technical reason as to why
those conditionals weren't being considered previously other than
ima_match_rules() didn't have a valid inode to use so it immediately
bailed out for KEXEC_CMDLINE operations rather than going through the
full list of conditional comparisons.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2020-07-20 13:28:16 -04:00
Lianbo Jiang fd7af71be5 kexec: do not verify the signature without the lockdown or mandatory signature
Signature verification is an important security feature, to protect
system from being attacked with a kernel of unknown origin.  Kexec
rebooting is a way to replace the running kernel, hence need be secured
carefully.

In the current code of handling signature verification of kexec kernel,
the logic is very twisted.  It mixes signature verification, IMA
signature appraising and kexec lockdown.

If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
signature, the supported crypto, and key, we don't think this is wrong,
Unless kexec lockdown is executed.  IMA is considered as another kind of
signature appraising method.

If kexec kernel image has signature/crypto/key, it has to go through the
signature verification and pass.  Otherwise it's seen as verification
failure, and won't be loaded.

Seems kexec kernel image with an unqualified signature is even worse
than those w/o signature at all, this sounds very unreasonable.  E.g.
If people get a unsigned kernel to load, or a kernel signed with expired
key, which one is more dangerous?

So, here, let's simplify the logic to improve code readability.  If the
KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature
verification is mandated.  Otherwise, we lift the bar for any kernel
image.

Link: http://lkml.kernel.org/r/20200602045952.27487-1-lijiang@redhat.com
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Reviewed-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26 00:27:36 -07:00
David Hildenbrand 3fe4f4991a kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED
Memory flagged with IORESOURCE_MEM_DRIVER_MANAGED is special - it won't be
part of the initial memmap of the kexec kernel and not all memory might be
accessible.  Don't place any kexec images onto it.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200508084217.9160-4-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:23 -07:00
Pavel Tatashin de68e4daea kexec: add machine_kexec_post_load()
It is the same as machine_kexec_prepare(), but is called after segments are
loaded. This way, can do processing work with already loaded relocation
segments. One such example is arm64: it has to have segments loaded in
order to create a page table, but it cannot do it during kexec time,
because at that time allocations won't be possible anymore.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 16:32:55 +00:00
Helge Deller f973cce0e4 kexec: Fix pointer-to-int-cast warnings
Fix two pointer-to-int-cast warnings when compiling for the 32-bit parisc
platform:

kernel/kexec_file.c: In function ‘crash_prepare_elf64_headers’:
kernel/kexec_file.c:1307:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  phdr->p_vaddr = (Elf64_Addr)_text;
                  ^
kernel/kexec_file.c:1324:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  phdr->p_vaddr = (unsigned long long) __va(mstart);
                  ^

Signed-off-by: Helge Deller <deller@gmx.de>
2019-11-01 21:42:58 +01:00
Linus Torvalds aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Matthew Garrett 29d3c1c8df kexec: Allow kexec_file() with appropriate IMA policy when locked down
Systems in lockdown mode should block the kexec of untrusted kernels.
For x86 and ARM we can ensure that a kernel is trustworthy by validating
a PE signature, but this isn't possible on other architectures. On those
platforms we can use IMA digital signatures instead. Add a function to
determine whether IMA has or will verify signatures for a given event type,
and if so permit kexec_file() even if the kernel is otherwise locked down.
This is restricted to cases where CONFIG_INTEGRITY_TRUSTED_KEYRING is set
in order to prevent an attacker from loading additional keys at runtime.

Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: linux-integrity@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:16 -07:00
Jiri Bohac 155bdd30af kexec_file: Restrict at runtime if the kernel is locked down
When KEXEC_SIG is not enabled, kernel should not load images through
kexec_file systemcall if the kernel is locked down.

[Modified by David Howells to fit with modifications to the previous patch
 and to return -EPERM if the kernel is locked down for consistency with
 other lockdowns. Modified by Matthew Garrett to remove the IMA
 integration, which will be replaced by integrating with the IMA
 architecture policy patches.]

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
cc: kexec@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:15 -07:00