Commit Graph

78 Commits

Author SHA1 Message Date
Baoquan He 6ddb054bd6 crash: split crash dumping code out from kexec_core.c
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflicts: There's conflict in last hunk of include/linux/kexec.h
           because of the fuzz caused by earlier back ported commits
           related to commit f4af41bf177a ("kexec: fix the unexpected
           kexec_dprintk() macro").

commit 02aff8480533817a29e820729360866441d7403d
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Jan 24 13:12:44 2024 +0800

    crash: split crash dumping code out from kexec_core.c

    Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes
    need be built in to avoid compiling error when building kexec code even
    though the crash dumping functionality is not enabled. E.g
    --------------------
    CONFIG_CRASH_CORE=y
    CONFIG_KEXEC_CORE=y
    CONFIG_KEXEC=y
    CONFIG_KEXEC_FILE=y
    ---------------------

    After splitting out crashkernel reservation code and vmcoreinfo exporting
    code, there's only crash related code left in kernel/crash_core.c. Now
    move crash related codes from kexec_core.c to crash_core.c and only build it
    in when CONFIG_CRASH_DUMP=y.

    And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope,
    or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP
    ifdef in generic kernel files.

    With these changes, crash_core codes are abstracted from kexec codes and
    can be disabled at all if only kexec reboot feature is wanted.

    Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Pingfan Liu <piliu@redhat.com>
    Cc: Klara Modin <klarasmodin@gmail.com>
    Cc: Michael Kelley <mhklinux@outlook.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Yang Li <yang.lee@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:35 +08:00
Baoquan He 0feadbaa1f kexec_core: fix the assignment to kimage->control_page
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 2861b37732627d7d115d77585ce4853f25cf332d
Author: Yuntao Wang <ytcoode@gmail.com>
Date:   Thu Dec 21 12:23:08 2023 +0800

    kexec_core: fix the assignment to kimage->control_page

    image->control_page represents the starting address for allocating the
    next control page, while hole_end represents the address of the last valid
    byte of the currently allocated control page.

    This bug actually does not affect the correctness of allocating control
    pages, because image->control_page is currently only used in
    kimage_alloc_crash_control_pages(), and this function, when allocating
    control pages, will first align image->control_page up to the nearest
    `(1 << order) << PAGE_SHIFT` boundary, then use this value as the
    starting address of the next control page.  This ensures that the newly
    allocated control page will use the correct starting address and not
    overlap with previously allocated control pages.

    Although it does not affect the correctness of the final result, it is
    better for us to set image->control_page to the correct value, in case
    it might be used elsewhere in the future, potentially causing errors.

    Therefore, after successfully allocating a control page,
    image->control_page should be updated to `hole_end + 1`, rather than
    hole_end.

    Link: https://lkml.kernel.org/r/20231221042308.11076-1-ytcoode@gmail.com
    Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:34 +08:00
Baoquan He d6481621cb kexec: modify the meaning of the end parameter in kimage_is_destination_range()
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 816d334afa85c836080b41bb6238aea845615ad9
Author: Yuntao Wang <ytcoode@gmail.com>
Date:   Sun Dec 17 11:35:26 2023 +0800

    kexec: modify the meaning of the end parameter in kimage_is_destination_range()

    The end parameter received by kimage_is_destination_range() should be the
    last valid byte address of the target memory segment plus 1.  However, in
    the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions,
    the corresponding value passed to kimage_is_destination_range() is the
    last valid byte address of the target memory segment, which is 1 less.

    There are two ways to fix this bug.  We can either correct the logic of
    the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions,
    or we can fix kimage_is_destination_range() by making the end parameter
    represent the last valid byte address of the target memory segment.  Here,
    we choose the second approach.

    Due to the modification to kimage_is_destination_range(), we also need to
    adjust its callers, such as kimage_alloc_normal_control_pages() and
    kimage_alloc_page().

    Link: https://lkml.kernel.org/r/20231217033528.303333-2-ytcoode@gmail.com
    Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He 6741a3b34d kexec: use ALIGN macro instead of open-coding it
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit db6b6fb70193f0defe4d5785e940156c06e9abbe
Author: Yuntao Wang <ytcoode@gmail.com>
Date:   Tue Dec 12 22:27:06 2023 +0800

    kexec: use ALIGN macro instead of open-coding it

    Use ALIGN macro instead of open-coding it to improve code readability.

    Link: https://lkml.kernel.org/r/20231212142706.25149-1-ytcoode@gmail.com
    Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He dbb24f4383 kexec: use atomic_try_cmpxchg in crash_kexec
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 0311d8272406b2ec47f485bef887723cc352a489
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Nov 14 17:12:01 2023 +0100

    kexec: use atomic_try_cmpxchg in crash_kexec

    Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
    crash_kexec().  x86 CMPXCHG instruction returns success in ZF flag,
    so this change saves a compare after cmpxchg.

    No functional change intended.

    Link: https://lkml.kernel.org/r/20231114161228.108516-1-ubizjak@gmail.com
    Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He ba669292fa crash_core: move crashk_*res definition into crash_core.c
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: There's conflict in kernel/kexec_core.c because of fuzz caused
          by beforehand back ported commit cbc2fe9d9cb2 ("kexec_file: add
          kexec_file flag to control debug printing").

commit b631b95dded5e7f007a3a79cbaf82ef50c1e2cf7
Author: Baoquan He <bhe@redhat.com>
Date:   Thu Sep 14 11:31:38 2023 +0800

    crash_core: move crashk_*res definition into crash_core.c

    Both crashk_res and crashk_low_res are used to mark the reserved
    crashkernel regions in iomem_resource tree.  And later the generic
    crashkernel resrvation will be added into crash_core.c.  So move
    crashk_res and crashk_low_res definition into crash_core.c to avoid
    compiling error if CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset.

    Meanwhile include <asm/crash_core.h> in <linux/crash_core.h> if generic
    reservation is needed.  In that case, <asm/crash_core.h> need be added by
    ARCH.  In asm/crash_core.h, ARCH can provide its own macro definitions to
    override macros in <linux/crash_core.h> if needed.  Wrap the including
    into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to
    avoid compiling error in other ARCH-es which don't take the generic
    reservation way yet.

    Link: https://lkml.kernel.org/r/20230914033142.676708-6-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Chen Jiahao <chenjiahao16@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:33 +08:00
Baoquan He e5524c12ce crash: add generic infrastructure for crash hotplug support
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: there's conflict in include/linux/kexec.h when defining
          arch_crash_handle_hotplug_event because below two commits have
          been back ported:
          commit 013a5d02a3 ("crash: memory and CPU hotplug sysfs attributes")
          commit ba5f45be9a ("kexec_file: add kexec_file flag to control debug printing")

commit 24726275612140af6b1c0afc7c6611ad66233207
Author: Eric DeVolder <eric.devolder@oracle.com>
Date:   Mon Aug 14 17:44:40 2023 -0400

    crash: add generic infrastructure for crash hotplug support

    To support crash hotplug, a mechanism is needed to update the crash
    elfcorehdr upon CPU or memory changes (eg.  hot un/plug or off/ onlining).
    The crash elfcorehdr describes the CPUs and memory to be written into the
    vmcore.

    To track CPU changes, callbacks are registered with the cpuhp mechanism
    via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN).  The crash hotplug
    elfcorehdr update has no explicit ordering requirement (relative to other
    cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN.
    CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a
    new state for crash hotplug.  Also, CPUHP_BP_PREPARE_DYN is the last state
    in the PREPARE group, just prior to the STARTING group, which is very
    close to the CPU starting up in a plug/online situation, or stopping in a
    unplug/ offline situation.  This minimizes the window of time during an
    actual plug/online or unplug/offline situation in which the elfcorehdr
    would be inaccurate.  Note that for a CPU being unplugged or offlined, the
    CPU will still be present in the list of CPUs generated by
    crash_prepare_elf64_headers().  However, there is no need to explicitly
    omit the CPU, see justification in 'crash: change
    crash_prepare_elf64_headers() to for_each_possible_cpu()'.

    To track memory changes, a notifier is registered to capture the memblock
    MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

    The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
    which performs needed tasks and then dispatches the event to the
    architecture specific arch_crash_handle_hotplug_event() to update the
    elfcorehdr with the current state of CPUs and memory.  During the process,
    the kexec_lock is held.

    Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.com
    Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
    Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Akhil Raj <lf32.dev@gmail.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Mimi Zohar <zohar@linux.ibm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Weißschuh <linux@weissschuh.net>
    Cc: Valentin Schneider <vschneid@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He 536f874a42 crash: move a few code bits to setup support of crash hotplug
JIRA: https://issues.redhat.com/browse/RHEL-58641

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 6f991cc363a3269866476b8ff10a112768d3d45c
Author: Eric DeVolder <eric.devolder@oracle.com>
Date:   Mon Aug 14 17:44:39 2023 -0400

    crash: move a few code bits to setup support of crash hotplug

    Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.

    Once the kdump service is loaded, if changes to CPUs or memory occur,
    either by hot un/plug or off/onlining, the crash elfcorehdr must also be
    updated.

    The elfcorehdr describes to kdump the CPUs and memory in the system, and
    any inaccuracies can result in a vmcore with missing CPU context or memory
    regions.

    The current solution utilizes udev to initiate an unload-then-reload of
    the kdump image (eg.  kernel, initrd, boot_params, purgatory and
    elfcorehdr) by the userspace kexec utility.  In the original post I
    outlined the significant performance problems related to offloading this
    activity to userspace.

    This patchset introduces a generic crash handler that registers with the
    CPU and memory notifiers.  Upon CPU or memory changes, from either hot
    un/plug or off/onlining, this generic handler is invoked and performs
    important housekeeping, for example obtaining the appropriate lock, and
    then invokes an architecture specific handler to do the appropriate
    elfcorehdr update.

    Note the description in patch 'crash: change crash_prepare_elf64_headers()
    to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
    enables further optimizations related to CPU plug/unplug/online/offline
    performance of elfcorehdr updates.

    In the case of x86_64, the arch specific handler generates a new
    elfcorehdr, and overwrites the old one in memory; thus no involvement with
    userspace needed.

    To realize the benefits/test this patchset, one must make a couple
    of minor changes to userspace:

     - Prevent udev from updating kdump crash kernel on hot un/plug changes.
       Add the following as the first lines to the RHEL udev rule file
       /usr/lib/udev/rules.d/98-kexec.rules:

       # The kernel updates the crash elfcorehdr for CPU and memory changes
       SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
       SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

       With this changeset applied, the two rules evaluate to false for
       CPU and memory change events and thus skip the userspace
       unload-then-reload of kdump.

     - Change to the kexec_file_load for loading the kdump kernel:
       Eg. on RHEL: in /usr/bin/kdumpctl, change to:
        standard_kexec_args="-p -d -s"
       which adds the -s to select kexec_file_load() syscall.

    This kernel patchset also supports kexec_load() with a modified kexec
    userspace utility.  A working changeset to the kexec userspace utility is
    posted to the kexec-tools mailing list here:

     http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

    To use the kexec-tools patch, apply, build and install kexec-tools, then
    change the kdumpctl's standard_kexec_args to replace the -s with
    --hotplug.  The removal of -s reverts to the kexec_load syscall and the
    addition of --hotplug invokes the changes put forth in the kexec-tools
    patch.

    This patch (of 8):

    The crash hotplug support leans on the work for the kexec_file_load()
    syscall.  To also support the kexec_load() syscall, a few bits of code
    need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
    moved out of kexec_file.c and into a common location crash_core.c.

    In addition, struct crash_mem and crash_notes were moved to new locales so
    that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.

    No functionality change intended.

    Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
    Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com
    Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
    Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Akhil Raj <lf32.dev@gmail.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Mimi Zohar <zohar@linux.ibm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Weißschuh <linux@weissschuh.net>
    Cc: Valentin Schneider <vschneid@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-12-23 09:35:32 +08:00
Baoquan He fa13fdacf6 kexec: enable kexec_crash_size to support two crash kernel regions
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 16c6006af4d4e70ecef93977a5314409d931020b
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:39 2023 +0800

    kexec: enable kexec_crash_size to support two crash kernel regions

    The crashk_low_res should be considered by /sys/kernel/kexec_crash_size
    to support two crash kernel regions shrinking if existing.

    While doing it, crashk_low_res will only be shrunk when the entire
    crashk_res is empty; and if the crashk_res is empty and crahk_low_res
    is not, change crashk_low_res to be crashk_res.

    [bhe@redhat.com: redo changelog]
    Link: https://lkml.kernel.org/r/20230527123439.772-7-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 961ff4962f kexec: add helper __crash_shrink_memory()
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 5b7bfb32cbaad6ba997ab4295ee50bc9921252f2
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:38 2023 +0800

    kexec: add helper __crash_shrink_memory()

    No functional change, in preparation for the next patch so that it is
    easier to review.

    [akpm@linux-foundation.org: make  __crash_shrink_memory() static]
      Link: https://lore.kernel.org/oe-kbuild-all/202305280717.Pw06aLkz-lkp@intel.com/
    Link: https://lkml.kernel.org/r/20230527123439.772-6-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 5b3c4cccd8 kexec: improve the readability of crash_shrink_memory()
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 8a7db7790a3ff8413bedef886cf135ba301e88d7
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:37 2023 +0800

    kexec: improve the readability of crash_shrink_memory()

    The major adjustments are:
    1. end = start + new_size.
       The 'end' here is not an accurate representation, because it is not the
       new end of crashk_res, but the start of ram_res, difference 1. So
       eliminate it and replace it with ram_res->start.
    2. Use 'ram_res->start' and 'ram_res->end' as arguments to
       crash_free_reserved_phys_range() to indicate that the memory covered by
       'ram_res' is released from the crashk. And keep it close to
       insert_resource().
    3. Replace 'if (start == end)' with 'if (!new_size)', clear indication that
       all crashk memory will be shrunken.

    No functional change.

    Link: https://lkml.kernel.org/r/20230527123439.772-5-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He cf5f5a1359 kexec: clear crashk_res if all its memory has been released
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit f7f567b95b12eda7a6a273b8cb82d152491bc0da
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:36 2023 +0800

    kexec: clear crashk_res if all its memory has been released

    If the resource of crashk_res has been released, it is better to clear
    crashk_res.start and crashk_res.end.  Because 'end = start - 1' is not
    reasonable, and in some places the test is based on crashk_res.end, not
    resource_size(&crashk_res).

    Link: https://lkml.kernel.org/r/20230527123439.772-4-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 516fa2159d kexec: delete a useless check in crash_shrink_memory()
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 6f22a744f4ee7a22be4704cf93bbe22decc7e79e
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:35 2023 +0800

    kexec: delete a useless check in crash_shrink_memory()

    The check '(crashk_res.parent != NULL)' is added by commit e05bd3367b
    ("kexec: fix Oops in crash_shrink_memory()"), but it's stale now.  Because
    if 'crashk_res' is not reserved, it will be zero in size and will be
    intercepted by the above 'if (new_size >= old_size)'.

    Ago:
            if (new_size >= end - start + 1)

    Now:
            old_size = (end == 0) ? 0 : end - start + 1;
            if (new_size >= old_size)

    Link: https://lkml.kernel.org/r/20230527123439.772-3-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 8b40ac3d86 kexec: fix a memory leak in crash_shrink_memory()
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 1cba6c4309f03de570202c46f03df3f73a0d4c82
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Sat May 27 20:34:34 2023 +0800

    kexec: fix a memory leak in crash_shrink_memory()

    Patch series "kexec: enable kexec_crash_size to support two crash kernel
    regions".

    When crashkernel=X fails to reserve region under 4G, it will fall back to
    reserve region above 4G and a region of the default size will also be
    reserved under 4G.  Unfortunately, /sys/kernel/kexec_crash_size only
    supports one crash kernel region now, the user cannot sense the low memory
    reserved by reading /sys/kernel/kexec_crash_size.  Also, low memory cannot
    be freed by writing this file.

    For example:
    resource_size(crashk_res) = 512M
    resource_size(crashk_low_res) = 256M

    The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be
    768M.  When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size
    of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB,
    which is incorrect.

    Since crashk_res manages the memory with high address and crashk_low_res
    manages the memory with low address, crashk_low_res is shrunken only when
    all crashk_res is shrunken.  And because when there is only one crash
    kernel region, crashk_res is always used.  Therefore, if all crashk_res is
    shrunken and crashk_low_res still exists, swap them.

    This patch (of 6):

    If the value of parameter 'new_size' is in the semi-open and semi-closed
    interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the
    calculation result of ram_res is:

            ram_res->start = crashk_res.end + 1
            ram_res->end   = crashk_res.end

    The operation of insert_resource() fails, and ram_res is not added to
    iomem_resource.  As a result, the memory of the control block ram_res is
    leaked.

    In fact, on all architectures, the start address and size of crashk_res
    are already aligned by KEXEC_CRASH_MEM_ALIGN.  Therefore, we do not need
    to round up crashk_res.start again.  Instead, we should round up
    'new_size' in advance.

    Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com
    Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com
    Fixes: 6480e5a092 ("kdump: add missing RAM resource in crash_shrink_memory()")
    Fixes: 06a7f71124 ("kexec: premit reduction of the reserved memory size")
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Cong Wang <amwang@redhat.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 51d035ece6 kexec: introduce sysctl parameters kexec_load_limit_*
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit a42aaad2e47b23d63037bfc0130e33fc0f74cd71
Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Wed Jan 4 15:38:48 2023 +0100

    kexec: introduce sysctl parameters kexec_load_limit_*

    kexec allows replacing the current kernel with a different one.  This is
    usually a source of concerns for sysadmins that want to harden a system.

    Linux already provides a way to disable loading new kexec kernel via
    kexec_load_disabled, but that control is very coard, it is all or nothing
    and does not make distinction between a panic kexec and a normal kexec.

    This patch introduces new sysctl parameters, with finer tuning to specify
    how many times a kexec kernel can be loaded.  The sysadmin can set
    different limits for kexec panic and kexec reboot kernels.  The value can
    be modified at runtime via sysctl, but only with a stricter value.

    With these new parameters on place, a system with loadpin and verity
    enabled, using the following kernel parameters:
    sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
    good warranty that if initrd tries to load a panic kernel, a malitious
    user will have small chances to replace that kernel with a different one,
    even if they can trigger timeouts on the disk where the panic kernel
    lives.

    Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Bagas Sanjaya <bagasdotme@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Philipp Rudo <prudo@redhat.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He bf22fc9dac kexec: factor out kexec_load_permitted
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 7e99f8b69c11c104933b9bc8fda226ebfb8aaaa5
Author: Ricardo Ribalda <ribalda@chromium.org>
Date:   Wed Jan 4 15:38:47 2023 +0100

    kexec: factor out kexec_load_permitted

    Both syscalls (kexec and kexec_file) do the same check, let's factor it
    out.

    Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org
    Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Bagas Sanjaya <bagasdotme@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Philipp Rudo <prudo@redhat.com>
    Cc: Ross Zwisler <zwisler@kernel.org>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:32 +08:00
Baoquan He 808f232b0f kexec: remove the unneeded result variable
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 32d0c98e428a3ec483d5d342983db467d7324f2b
Author: ye xingchen <ye.xingchen@zte.com.cn>
Date:   Thu Sep 29 12:29:34 2022 +0800

    kexec: remove the unneeded result variable

    Return the value kimage_add_entry() directly instead of storing it in
    another redundant variable.

    Link: https://lkml.kernel.org/r/20220929042936.22012-3-bhe@redhat.com
    Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Reported-by: Zeal Robot <zealci@zte.com.cn>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Chen Lifu <chenlifu@huawei.com>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Jianglei Nie <niejianglei2021@163.com>
    Cc: Li Chen <lchen@ambarella.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:31 +08:00
Baoquan He b174873545 kexec: replace kmap() with kmap_local_page()
JIRA: https://issues.redhat.com/browse/RHEL-32199

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit 948084f0f6959f602f89f679522b706a72da0285
Author: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Date:   Sun Aug 21 20:25:19 2022 +0200

    kexec: replace kmap() with kmap_local_page()

    kmap() is being deprecated in favor of kmap_local_page().

    There are two main problems with kmap(): (1) It comes with an overhead as
    mapping space is restricted and protected by a global lock for
    synchronization and (2) it also requires global TLB invalidation when the
    kmap's pool wraps and it might block when the mapping space is fully
    utilized until a slot becomes available.

    With kmap_local_page() the mappings are per thread, CPU local, can take
    page faults, and can be called from any context (including interrupts).
    It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
    the tasks can be preempted and, when they are scheduled to run again, the
    kernel virtual addresses are restored and are still valid.

    Since its use in kexec_core.c is safe everywhere, it should be preferred.

    Therefore, replace kmap() with kmap_local_page() in kexec_core.c.

    Tested on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
    HIGHMEM64GB enabled.

    Link: https://lkml.kernel.org/r/20220821182519.9483-1-fmdefrancesco@gmail.com
    Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
    Suggested-by: Ira Weiny <ira.weiny@intel.com>
    Reviewed-by: Ira Weiny <ira.weiny@intel.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-05-15 10:32:31 +08:00
Baoquan He ba5f45be9a kexec_file: add kexec_file flag to control debug printing
JIRA: https://issues.redhat.com/browse/RHEL-477

Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Conflict: There are conflict in
          2nd hunk of include/linux/kexec.h
          kernel/kexec_core.c
          because memory/cpu hotplug support on crash is not back
          ported to rhel9 yet.

commit cbc2fe9d9cb226347365753f50d81bc48cc3c52e
Author: Baoquan He <bhe@redhat.com>
Date:   Wed Dec 13 13:57:41 2023 +0800

    kexec_file: add kexec_file flag to control debug printing

    Patch series "kexec_file: print out debugging message if required", v4.

    Currently, specifying '-d' on kexec command will print a lot of debugging
    informationabout kexec/kdump loading with kexec_load interface.

    However, kexec_file_load prints nothing even though '-d' is specified.
    It's very inconvenient to debug or analyze the kexec/kdump loading when
    something wrong happened with kexec/kdump itself or develper want to check
    the kexec/kdump loading.

    In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked
    in code.  If it's passed in, debugging message of kexec_file code will be
    printed out and can be seen from console and dmesg.  Otherwise, the
    debugging message is printed like beofre when pr_debug() is taken.

    Note:
    ****
    =====
    1) The code in kexec-tools utility also need be changed to support
    passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified.
    The patch link is here:
    =========
    [PATCH] kexec_file: add kexec_file flag to support debug printing
    http://lists.infradead.org/pipermail/kexec/2023-November/028505.html

    2) s390 also has kexec_file code, while I am not sure what debugging
    information is necessary. So leave it to s390 developer.

    Test:
    ****
    ====
    Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64
    again. And on x86_64, the printed messages look like below:
    --------------------------------------------------------------
    kexec measurement buffer for the loaded kernel at 0x207fffe000.
    Loaded purgatory at 0x207fff9000
    Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180
    Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000
    Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280
    Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro
    rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M
    E820 memmap:
    0000000000000000-000000000009a3ff (1)
    000000000009a400-000000000009ffff (2)
    00000000000e0000-00000000000fffff (2)
    0000000000100000-000000006ff83fff (1)
    000000006ff84000-000000007ac50fff (2)
    ......
    000000207fff6150-000000207fff615f (128)
    000000207fff6160-000000207fff714f (1)
    000000207fff7150-000000207fff715f (128)
    000000207fff7160-000000207fff814f (1)
    000000207fff8150-000000207fff815f (128)
    000000207fff8160-000000207fffffff (1)
    nr_segments = 5
    segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000
    segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000
    segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000
    segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000
    segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000
    kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8
    ---------------------------------------------------------------------------

    This patch (of 7):

    When specifying 'kexec -c -d', kexec_load interface will print loading
    information, e.g the regions where kernel/initrd/purgatory/cmdline are
    put, the memmap passed to 2nd kernel taken as system RAM ranges, and
    printing all contents of struct kexec_segment, etc.  These are very
    helpful for analyzing or positioning what's happening when kexec/kdump
    itself failed.  The debugging printing for kexec_load interface is made in
    user space utility kexec-tools.

    Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
    Because kexec_file code is mostly implemented in kernel space, and the
    debugging printing functionality is missed.  It's not convenient when
    debugging kexec/kdump loading and jumping with kexec_file_load interface.

    Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
    message printing.  And add global variable kexec_file_dbg_print and macro
    kexec_dprintk() to facilitate the printing.

    This is a preparation, later kexec_dprintk() will be used to replace the
    existing pr_debug().  Once 'kexec -s -d' is specified, it will print out
    kexec/kdump loading information.  If '-d' is not specified, it regresses
    to pr_debug().

    Link: https://lkml.kernel.org/r/20231213055747.61826-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20231213055747.61826-2-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Cc: Conor Dooley <conor@kernel.org>
    Cc: Joe Perches <joe@perches.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Baoquan He <bhe@redhat.com>
2024-04-28 21:52:02 +08:00
Baoquan He 819bdd2a33 kexec: do syscore_shutdown() in kernel_kexec
JIRA: https://issues.redhat.com/browse/RHEL-19654
Upstream status: Linus
Conflict: None

commit 7bb943806ff61e83ae4cceef8906b7fe52453e8a
Author: James Gowans <jgowans@amazon.com>
Date:   Wed Dec 13 08:40:04 2023 +0200

    kexec: do syscore_shutdown() in kernel_kexec

    syscore_shutdown() runs driver and module callbacks to get the system into
    a state where it can be correctly shut down.  In commit 6f389a8f1d ("PM
    / reboot: call syscore_shutdown() after disable_nonboot_cpus()")
    syscore_shutdown() was removed from kernel_restart_prepare() and hence got
    (incorrectly?) removed from the kexec flow.  This was innocuous until
    commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to
    hook restart/shutdown") changed the way that KVM registered its shutdown
    callbacks, switching from reboot notifiers to syscore_ops.shutdown.  As
    syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run
    and virtualisation is left enabled on the boot CPU which results in triple
    faults when switching to the new kernel on Intel x86 VT-x with VMXE
    enabled.

    Fix this by adding syscore_shutdown() to the kexec sequence.  In terms of
    where to add it, it is being added after migrating the kexec task to the
    boot CPU, but before APs are shut down.  It is not totally clear if this
    is the best place: in commit 6f389a8f1d ("PM / reboot: call
    syscore_shutdown() after disable_nonboot_cpus()") it is stated that
    "syscore_ops operations should be carried with one CPU on-line and
    interrupts disabled." APs are only offlined later in machine_shutdown(),
    so this syscore_shutdown() is being run while APs are still online.  This
    seems to be the correct place as it matches where syscore_shutdown() is
    run in the reboot and halt flows - they also run it before APs are shut
    down.  The assumption is that the commit message in commit 6f389a8f1d
    ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is
    no longer valid.

    KVM has been discussed here as it is what broke loudly by not having
    syscore_shutdown() in kexec, but this change impacts more than just KVM;
    all drivers/modules which register a syscore_ops.shutdown callback will
    now be invoked in the kexec flow.  Looking at some of them like x86 MCE it
    is probably more correct to also shut these down during kexec.
    Maintainers of all drivers which use syscore_ops.shutdown are added on CC
    for visibility.  They are:

    arch/powerpc/platforms/cell/spu_base.c  .shutdown = spu_shutdown,
    arch/x86/kernel/cpu/mce/core.c          .shutdown = mce_syscore_shutdown,
    arch/x86/kernel/i8259.c                 .shutdown = i8259A_shutdown,
    drivers/irqchip/irq-i8259.c             .shutdown = i8259A_shutdown,
    drivers/irqchip/irq-sun6i-r.c           .shutdown = sun6i_r_intc_shutdown,
    drivers/leds/trigger/ledtrig-cpu.c      .shutdown = ledtrig_cpu_syscore_shutdown,
    drivers/power/reset/sc27xx-poweroff.c   .shutdown = sc27xx_poweroff_shutdown,
    kernel/irq/generic-chip.c               .shutdown = irq_gc_shutdown,
    virt/kvm/kvm_main.c                     .shutdown = kvm_shutdown,

    This has been tested by doing a kexec on x86_64 and aarch64.

    Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com
    Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
    Signed-off-by: James Gowans <jgowans@amazon.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Chen-Yu Tsai <wens@csie.org>
    Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
    Cc: Samuel Holland <samuel@sholland.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Sebastian Reichel <sre@kernel.org>
    Cc: Orson Zhai <orsonzhai@gmail.com>
    Cc: Alexander Graf <graf@amazon.de>
    Cc: Jan H. Schoenherr <jschoenh@amazon.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
2024-01-23 06:06:00 -05:00
Viktor Malik 23c9904275 bpf: Add __bpf_kfunc tag to all kfuncs
Bugzilla: https://bugzilla.redhat.com/2178930

commit 400031e05adfcef9e80eca80bdfc3f4b63658be4
Author: David Vernet <void@manifault.com>
Date:   Wed Feb 1 11:30:15 2023 -0600

    bpf: Add __bpf_kfunc tag to all kfuncs

    Now that we have the __bpf_kfunc tag, we should use add it to all
    existing kfuncs to ensure that they'll never be elided in LTO builds.

    Signed-off-by: David Vernet <void@manifault.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Stanislav Fomichev <sdf@google.com>
    Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com

Signed-off-by: Viktor Malik <vmalik@redhat.com>
2023-06-13 22:45:20 +02:00
Valentin Schneider 2e6264f252 panic, kexec: make __crash_kexec() NMI safe
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2166717
Upstream-status: https://github.com/torvalds/linux.git

commit 05c6257433b7212f07a7e53479a8ab038fc1666a
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Thu Jun 30 23:32:58 2022 +0100

    panic, kexec: make __crash_kexec() NMI safe

    Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
    panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
    of mutex_trylock():

	    if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		    return 0;

    This prevents an nmi_panic() from executing the main body of
    __crash_kexec() which does the actual kexec into the kdump kernel.  The
    warning and return are explained by:

      6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
      [...]
      The reasons for this are:

	  1) There is a potential deadlock in the slowpath

	  2) Another cpu which blocks on the rtmutex will boost the task
	     which allegedly locked the rtmutex, but that cannot work
	     because the hard/softirq context borrows the task context.

    Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
    and replace it with an atomic variable.  This is somewhat overzealous as
    *some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
    like crash_shrink_memory()), but this has the benefit of involving a
    single unified lock and preventing any future NMI-related surprises.

    Tested by triggering NMI panics via:

      $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
      $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
      $ echo 1 > /proc/sys/kernel/panic

      $ ipmitool power diag

    Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
    Fixes: 6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Juri Lelli <jlelli@redhat.com>
    Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2023-02-06 09:36:01 +00:00
Valentin Schneider 779ee17b76 kexec: turn all kexec_mutex acquisitions into trylocks
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2166717
Upstream-status: https://github.com/torvalds/linux.git

commit 7bb5da0d490b2d836c5218f5186ee588d2145310
Author: Valentin Schneider <vschneid@redhat.com>
Date:   Thu Jun 30 23:32:57 2022 +0100

    kexec: turn all kexec_mutex acquisitions into trylocks

    Patch series "kexec, panic: Making crash_kexec() NMI safe", v4.

    This patch (of 2):

    Most acquistions of kexec_mutex are done via mutex_trylock() - those were
    a direct "translation" from:

      8c5a1cf0ad ("kexec: use a mutex for locking rather than xchg()")

    there have however been two additions since then that use mutex_lock():
    crash_get_memory_size() and crash_shrink_memory().

    A later commit will replace said mutex with an atomic variable, and
    locking operations will become atomic_cmpxchg().  Rather than having those
    mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into
    trylocks that can return -EBUSY on acquisition failure.

    This does halve the printable size of the crash kernel, but that's still
    neighbouring 2G for 32bit kernels which should be ample enough.

    Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com
    Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com
    Signed-off-by: Valentin Schneider <vschneid@redhat.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Juri Lelli <jlelli@redhat.com>
    Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2023-02-06 09:36:01 +00:00
Baoquan He 42a08bdc95 kexec: remove redundant assignments
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict: None

commit 16b0b7adabfb5564a77fa35917afe08decd55b29
Author: Michal Orzel <michalorzel.eng@gmail.com>
Date:   Fri Apr 29 14:38:03 2022 -0700

    kexec: remove redundant assignments

    Get rid of redundant assignments which end up in values not being read
    either because they are overwritten or the function ends.

    Reported by clang-tidy [deadcode.DeadStores]

    Link: https://lkml.kernel.org/r/20220326180948.192154-1-michalorzel.eng@gmail.com
    Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Michal Orzel <michalorzel.eng@gmail.com>

    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Baoquan He 2cb1e7c8dd kernel/kexec_core: move kexec_core sysctls into its own file
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict: there's conflict in 1st hunk of kernel/sysctl.c, edit it
manually.

commit a467257ffe4bdb13eacddec0137013f6a1140b81
Author: yingelin <yingelin@huawei.com>
Date:   Sun Apr 24 10:57:40 2022 +0800

    kernel/kexec_core: move kexec_core sysctls into its own file

    This move the kernel/kexec_core.c respective sysctls to its own file.

    kernel/sysctl.c has grown to an insane mess, We move sysctls to places
    where features actually belong to improve the readability and reduce
    merge conflicts. At the same time, the proc-sysctl maintainers can easily
    care about the core logic other than the sysctl knobs added for some feature.

    We already moved all filesystem sysctls out. This patch is part of the effort
    to move kexec related sysctls out.

    Signed-off-by: yingelin <yingelin@huawei.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Baoquan He 81cbb55694 ELF: Remove elf_core_copy_kernel_regs()
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
Conflict: None

commit 9554e908fb5d02e48a681d1eca180225bf109e83
Author: Brian Gerst <brgerst@gmail.com>
Date:   Fri Mar 25 11:39:51 2022 -0400

    ELF: Remove elf_core_copy_kernel_regs()

    x86-32 was the last architecture that implemented separate user and
    kernel registers.

    Signed-off-by: Brian Gerst <brgerst@gmail.com>
    Signed-off-by: Borislav Petkov <bp@suse.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Andy Lutomirski <luto@kernel.org>
    Acked-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20220325153953.162643-3-brgerst@gmail.com
Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Baoquan He a37dcb1370 kexec: drop weak attribute from functions
Bugzilla: https://bugzilla.redhat.com/2119002
Upstream Status: Linus's tree
There's conflict in arch/powerpc/include/asm/kexec.h because below
upstream commit is not pulled back yet:
commit 76222808fc25 ("powerpc: Move C prototypes out of asm-prototypes.h")

commit 0738eceb6201691534df07e0928d0a6168a35787
Author: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Date:   Fri Jul 1 13:04:05 2022 +0530

    kexec: drop weak attribute from functions

    Drop __weak attribute from functions in kexec_core.c:
    - machine_kexec_post_load()
    - arch_kexec_protect_crashkres()
    - arch_kexec_unprotect_crashkres()
    - crash_free_reserved_phys_range()

    Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
    Suggested-by: Eric Biederman <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
2022-11-16 03:47:02 -05:00
Chris von Recklinghausen c2f14a5cab exit: Move oops specific logic from do_exit into make_task_dead
Conflicts: kernel/exit.c - We alredy have
	b1f866b013e6 ("block: remove blk_needs_flush_plug")
	so the 'WARN_ON(blk_needs_flush_plug(tsk));' line is gone.

Bugzilla: https://bugzilla.redhat.com/2120352

commit 05ea0424f0e21c0ef9b47c89826e7c22ae137975
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Nov 22 09:33:00 2021 -0600

    exit: Move oops specific logic from do_exit into make_task_dead

    The beginning of do_exit has become cluttered and difficult to read as
    it is filled with checks to handle things that can only happen when
    the kernel is operating improperly.

    Now that we have a dedicated function for cleaning up a task when the
    kernel is operating improperly move the checks there.

    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:33 -04:00
Prarit Bhargava 78720a8d2e printk: remove safe buffers
Bugzilla: http://bugzilla.redhat.com/2023082

commit 93d102f094be9beab28e5afb656c188b16a3793b
Author: John Ogness <john.ogness@linutronix.de>
Date:   Thu Jul 15 21:39:56 2021 +0206

    printk: remove safe buffers

    With @logbuf_lock removed, the high level printk functions for
    storing messages are lockless. Messages can be stored from any
    context, so there is no need for the NMI and safe buffers anymore.
    Remove the NMI and safe buffers.

    Although the safe buffers are removed, the NMI and safe context
    tracking is still in place. In these contexts, store the message
    immediately but still use irq_work to defer the console printing.

    Since printk recursion tracking is in place, safe context tracking
    for most of printk is not needed. Remove it. Only safe context
    tracking relating to the console and console_owner locks is left
    in place. This is because the console and console_owner locks are
    needed for the actual printing.

    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Reviewed-by: Petr Mladek <pmladek@suse.com>
    Signed-off-by: Petr Mladek <pmladek@suse.com>
    Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2022-01-11 09:45:30 -05:00
Andy Shevchenko f39650de68 kernel.h: split out panic and oops helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
  Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:04 -07:00
Pavel Tatashin b2075dbb15 kexec: dump kmessage before machine_kexec
kmsg_dump(KMSG_DUMP_SHUTDOWN) is called before machine_restart(),
machine_halt(), and machine_power_off().  The only one that is missing
is machine_kexec().

The dmesg output that it contains can be used to study the shutdown
performance of both kernel and systemd during kexec reboot.

Here is example of dmesg data collected after kexec:

  root@dplat-cp22:~# cat /sys/fs/pstore/dmesg-ramoops-0 | tail
  ...
  [   70.914592] psci: CPU3 killed (polled 0 ms)
  [   70.915705] CPU4: shutdown
  [   70.916643] psci: CPU4 killed (polled 4 ms)
  [   70.917715] CPU5: shutdown
  [   70.918725] psci: CPU5 killed (polled 0 ms)
  [   70.919704] CPU6: shutdown
  [   70.920726] psci: CPU6 killed (polled 4 ms)
  [   70.921642] CPU7: shutdown
  [   70.922650] psci: CPU7 killed (polled 0 ms)

Link: https://lkml.kernel.org/r/20210319192326.146000-2-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:32 -07:00
Joe LeVeque a119b4e518 kexec: Add kexec reboot string
The purpose is to notify the kernel module for fast reboot.

Upstream a patch from the SONiC network operating system [1].

[1]: https://github.com/Azure/sonic-linux-kernel/pull/46

Link: https://lkml.kernel.org/r/20210304124626.13927-1-pmenzel@molgen.mpg.de
Signed-off-by: Joe LeVeque <jolevequ@microsoft.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Guohan Lu <lguohan@gmail.com>
Cc: Joe LeVeque <jolevequ@microsoft.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:32 -07:00
Linus Torvalds 591fd30eee Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ELF compat updates from Al Viro:
 "Sanitizing ELF compat support, especially for triarch architectures:

   - X32 handling cleaned up

   - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now

   - Kconfig side of things regularized

  Eventually I hope to have compat_binfmt_elf.c killed, with both native
  and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
  by kbuild, but that's a separate story - not included here"

* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of COMPAT_ELF_EXEC_PAGESIZE
  compat_binfmt_elf: don't bother with undef of ELF_ARCH
  Kconfig: regularize selection of CONFIG_BINFMT_ELF
  mips compat: switch to compat_binfmt_elf.c
  mips: don't bother with ELF_CORE_EFLAGS
  mips compat: don't bother with ELF_ET_DYN_BASE
  mips: KVM_GUEST makes no sense for 64bit builds...
  mips: kill unused definitions in binfmt_elf[on]32.c
  mips binfmt_elf*32.c: use elfcore-compat.h
  x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
  [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
  elf_prstatus: collect the common part (everything before pr_reg) into a struct
  binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-21 09:29:23 -08:00
Baoquan He 56c91a1843 kernel: kexec: remove the lock operation of system_transition_mutex
Function kernel_kexec() is called with lock system_transition_mutex
held in reboot system call. While inside kernel_kexec(), it will
acquire system_transition_mutex agin. This will lead to dead lock.

The dead lock should be easily triggered, it hasn't caused any
failure report just because the feature 'kexec jump' is almost not
used by anyone as far as I know. An inquiry can be made about who
is using 'kexec jump' and where it's used. Before that, let's simply
remove the lock operation inside CONFIG_KEXEC_JUMP ifdeffery scope.

Fixes: 55f2503c3b ("PM / reboot: Eliminate race between reboot and suspend")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Pingfan Liu <kernelfans@gmail.com>
Cc: 4.19+ <stable@vger.kernel.org> # 4.19+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-25 18:40:37 +01:00
Al Viro f2485a2dc9 elf_prstatus: collect the common part (everything before pr_reg) into a struct
Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
the beginning of compat_elf_prstatus, take these fields into a separate
structure (compat_elf_prstatus_common), so that it could be reused.  Due to
the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
need the same shape change done to native struct elf_prstatus, gathering the
fields prior to pr_reg into a new structure (struct elf_prstatus_common).

Fortunately, offset of pr_reg is always a multiple of 16 with no padding
right before it, so it's possible to turn all the stuff prior to it into
a single member without disturbing the layout.

[build fix from Geert Uytterhoeven folded in]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-06 08:38:29 -05:00
Eric Biggers a24d22b225 crypto: sha - split sha.h into sha1.h and sha2.h
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure.  So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1.  It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-20 14:45:33 +11:00
Randy Dunlap 7b7b8a2c95 kernel/: fix repeated words in comments
Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word.  Change one
instance of "the the" to "that the".  Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Julien Thierry 00089c048e objtool: Rename frame.h -> objtool.h
Header frame.h is getting more code annotations to help objtool analyze
object files.

Rename the file to objtool.h.

[ jpoimboe: add objtool.h to MAINTAINERS ]

Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10 10:43:13 -05:00
Pavel Tatashin de68e4daea kexec: add machine_kexec_post_load()
It is the same as machine_kexec_prepare(), but is called after segments are
loaded. This way, can do processing work with already loaded relocation
segments. One such example is arm64: it has to have segments loaded in
order to create a page table, but it cannot do it during kexec time,
because at that time allocations won't be possible anymore.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 16:32:55 +00:00
Pavel Tatashin d42cc530b1 kexec: quiet down kexec reboot
Here is a regular kexec command sequence and output:
=====
$ kexec --reuse-cmdline -i --load Image
$ kexec -e
[  161.342002] kexec_core: Starting new kernel

Welcome to Buildroot
buildroot login:
=====

Even when "quiet" kernel parameter is specified, "kexec_core: Starting
new kernel" is printed.

This message has  KERN_EMERG level, but there is no emergency, it is a
normal kexec operation, so quiet it down to appropriate KERN_NOTICE.

Machines that have slow console baud rate benefit from less output.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 16:32:55 +00:00
Tetsuo Handa 7c3a6aedcd kexec: bail out upon SIGKILL when allocating memory.
syzbot found that a thread can stall for minutes inside kexec_load() after
that thread was killed by SIGKILL [1].  It turned out that the reproducer
was trying to allocate 2408MB of memory using kimage_alloc_page() from
kimage_load_normal_segment().  Let's check for SIGKILL before doing memory
allocation.

[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25 17:51:40 -07:00
Thomas Gleixner 40b0b3f8fb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230
Based on 2 normalized pattern(s):

  this source code is licensed under the gnu general public license
  version 2 see the file copying for more details

  this source code is licensed under general public license version 2
  see

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 52 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:06 +02:00
Nicholas Piggin 2f1a6fbbef power/suspend: Add function to disable secondaries for suspend
This adds a function to disable secondary CPUs for suspend that are
not necessarily non-zero / non-boot CPUs. Platforms will be able to
use this to suspend using non-zero CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03 19:42:41 +02:00
Arun KS ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Arun KS 3d6357de8a mm: reference totalram_pages and managed_pages once per function
Patch series "mm: convert totalram_pages, totalhigh_pages and managed
pages to atomic", v5.

This series converts totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

totalram_pages, zone->managed_pages and totalhigh_pages updates are
protected by managed_page_count_lock, but readers never care about it.
Convert these variables to atomic to avoid readers potentially seeing a
store tear.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
to remove the lock and convert variables to atomic.  With the change,
preventing poteintial store-to-read tearing comes as a bonus.

This patch (of 4):

This is in preparation to a later patch which converts totalram_pages and
zone->managed_pages to atomic variables.  Please note that re-reading the
value might lead to a different value and as such it could lead to
unexpected behavior.  There are no known bugs as a result of the current
code but it is better to prevent from them in principle.

Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Lianbo Jiang 9cf38d5559 kexec: Allocate decrypted control pages for kdump if SME is enabled
When SME is enabled in the first kernel, it needs to allocate decrypted
pages for kdump because when the kdump kernel boots, these pages need to
be accessed decrypted in the initial boot stage, before SME is enabled.

 [ bp: clean up text. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kexec@lists.infradead.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: akpm@linux-foundation.org
Cc: dan.j.williams@intel.com
Cc: bhelgaas@google.com
Cc: baiyaowei@cmss.chinamobile.com
Cc: tiwai@suse.de
Cc: brijesh.singh@amd.com
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: jroedel@suse.de
Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.com
2018-10-06 12:01:51 +02:00
Jarrett Farnitano a8311f647e kexec: yield to scheduler when loading kimage segments
Without yielding while loading kimage segments, a large initrd will
block all other work on the CPU performing the load until it is
completed.  For example loading an initrd of 200MB on a low power single
core system will lock up the system for a few seconds.

To increase system responsiveness to other tasks at that time, call
cond_resched() in both the crash kernel and normal kernel segment
loading loops.

I did run into a practical problem.  Hardware watchdogs on embedded
systems can have short timers on the order of seconds.  If the system is
locked up for a few seconds with only a single core available, the
watchdog may not be pet in a timely fashion.  If this happens, the
hardware watchdog will fire and reset the system.

This really only becomes a problem when you are working with a single
core, a decently sized initrd, and have a constrained hardware watchdog.

Link: http://lkml.kernel.org/r/1528738546-3328-1-git-send-email-jmf@amazon.com
Signed-off-by: Jarrett Farnitano <jmf@amazon.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
Tom Lendacky bba4ed011a x86/mm, kexec: Allow kexec to be used with SME
Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: <kexec@lists.infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:38:04 +02:00
Xunlei Pang 1229384f5b kdump: protect vmcoreinfo data under the crash memory
Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
has the risk of being modified by some wrong code during system is
running.

As a result, vmcore dumped may contain the wrong vmcoreinfo.  Later on,
when using "crash", "makedumpfile", etc utility to parse this vmcore, we
probably will get "Segmentation fault" or other unexpected errors.

E.g.  1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
system; 3) trigger kdump, then we obviously will fail to recognize the
crash context correctly due to the corrupted vmcoreinfo.

Now except for vmcoreinfo, all the crash data is well
protected(including the cpu note which is fully updated in the crash
path, thus its correctness is guaranteed).  Given that vmcoreinfo data
is a large chunk prepared for kdump, we better protect it as well.

To solve this, we relocate and copy vmcoreinfo_data to the crash memory
when kdump is loading via kexec syscalls.  Because the whole crash
memory will be protected by existing arch_kexec_protect_crashkres()
mechanism, we naturally protect vmcoreinfo_data from write(even read)
access under kernel direct mapping after kdump is loaded.

Since kdump is usually loaded at the very early stage after boot, we can
trust the correctness of the vmcoreinfo data copied.

On the other hand, we still need to operate the vmcoreinfo safe copy
when crash happens to generate vmcoreinfo_note again, we rely on vmap()
to map out a new kernel virtual address and update to use this new one
instead in the following crash_save_vmcoreinfo().

BTW, we do not touch vmcoreinfo_note, because it will be fully updated
using the protected vmcoreinfo_data after crash which is surely correct
just like the cpu crash note.

Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:00 -07:00
Josh Poimboeuf c207aee480 objtool, x86: Add several functions and files to the objtool whitelist
In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.

These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage.  Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 10:19:19 +02:00