Commit Graph

144 Commits

Author SHA1 Message Date
David Arcari dd1b0f7fae hwmon: (coretemp) Extend the bitmask to read temperature to 0xff
JIRA: https://issues.redhat.com/browse/RHEL-66569

commit f0c344c000d09e38a0240b4a6ccbcd553b18e762
Author: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Date:   Thu Apr 25 10:13:11 2024 -0700

    hwmon: (coretemp) Extend the bitmask to read temperature to 0xff

    The Intel Software Development manual defines the temperature digital
    readout as the bits [22:16] of the IA32_[PACKAGE]_THERM_STATUS registers.
    Bit 23 is specified as reserved.

    In recent processors, however, the temperature digital readout uses bits
    [23:16]. In those processors, using the bitmask 0x7f would lead to
    incorrect readings if the temperature deviates from TjMax by more than
    127 degrees Celsius.

    Although not guaranteed, bit 23 is likely to be 0 in processors from a few
    generations ago. The temperature reading would still be correct in those
    processors when using a 0xff bitmask.

    Model-specific provisions can be made for older processors in which bit 23
    is not 0 should the need arise.

    Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
    Link: https://lore.kernel.org/r/20240425171311.19519-4-ricardo.neri-calderon@linux.intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2024-12-20 09:27:40 -05:00
David Arcari 6efe3c0555 x86/cpu/topology: Rename topology_max_die_per_package() [partial]
JIRA: https://issues.redhat.com/browse/RHEL-20130
Conflicts: limited to affected code under drivers/thermal

commit bd745d1c41e7fa56242889eb5dc6df2d7dd5df32
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Feb 13 22:06:13 2024 +0100

    x86/cpu/topology: Rename topology_max_die_per_package()

    The plural of die is dies.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Michael Kelley <mhklinux@outlook.com>
    Tested-by: Sohil Mehta <sohil.mehta@intel.com>
    Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de

Signed-off-by: David Arcari <darcari@redhat.com>
2024-10-25 14:16:36 -04:00
David Arcari d8990628b4 hwmon: (coretemp) Enlarge per package core count limit
JIRA: https://issues.redhat.com/browse/RHEL-22705

commit 34cf8c657cf0365791cdc658ddbca9cc907726ce
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Feb 2 17:21:36 2024 +0800

    hwmon: (coretemp) Enlarge per package core count limit

    Currently, coretemp driver supports only 128 cores per package.
    This loses some core temperature information on systems that have more
    than 128 cores per package.
     [   58.685033] coretemp coretemp.0: Adding Core 128 failed
     [   58.692009] coretemp coretemp.0: Adding Core 129 failed
     ...

    Enlarge the limitation to 512 because there are platforms with more than
    256 cores per package.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2024-04-25 15:24:42 -04:00
David Arcari f07080a8cb hwmon: (coretemp) Fix bogus core_id to attr name mapping
JIRA: https://issues.redhat.com/browse/RHEL-22705

commit fdaf0c8629d4524a168cb9e4ad4231875749b28c
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Feb 2 17:21:35 2024 +0800

    hwmon: (coretemp) Fix bogus core_id to attr name mapping

    Before commit 7108b80a542b ("hwmon/coretemp: Handle large core ID
    value"), there is a fixed mapping between
    1. cpu_core_id
    2. the index in pdata->core_data[] array
    3. the sysfs attr name, aka "tempX_"
    The later two always equal cpu_core_id + 2.

    After the commit, pdata->core_data[] index is got from ida so that it
    can handle sparse core ids and support more cores within a package.

    However, the commit erroneously maps the sysfs attr name to
    pdata->core_data[] index instead of cpu_core_id + 2.

    As a result, the code is not aligned with the comments, and brings user
    visible changes in hwmon sysfs on systems with sparse core id.

    For example, before commit 7108b80a542b ("hwmon/coretemp: Handle large
    core ID value"),
    /sys/class/hwmon/hwmon2/temp2_label:Core 0
    /sys/class/hwmon/hwmon2/temp3_label:Core 1
    /sys/class/hwmon/hwmon2/temp4_label:Core 2
    /sys/class/hwmon/hwmon2/temp5_label:Core 3
    /sys/class/hwmon/hwmon2/temp6_label:Core 4
    /sys/class/hwmon/hwmon3/temp10_label:Core 8
    /sys/class/hwmon/hwmon3/temp11_label:Core 9
    after commit,
    /sys/class/hwmon/hwmon2/temp2_label:Core 0
    /sys/class/hwmon/hwmon2/temp3_label:Core 1
    /sys/class/hwmon/hwmon2/temp4_label:Core 2
    /sys/class/hwmon/hwmon2/temp5_label:Core 3
    /sys/class/hwmon/hwmon2/temp6_label:Core 4
    /sys/class/hwmon/hwmon2/temp7_label:Core 8
    /sys/class/hwmon/hwmon2/temp8_label:Core 9

    Restore the previous behavior and rework the code, comments and variable
    names to avoid future confusions.

    Fixes: 7108b80a542b ("hwmon/coretemp: Handle large core ID value")
    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20240202092144.71180-3-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2024-04-25 15:24:42 -04:00
David Arcari b3fc214423 hwmon: (coretemp) Fix out-of-bounds memory access
JIRA: https://issues.redhat.com/browse/RHEL-22705
JIRA: https://issues.redhat.com/browse/RHEL-31307
CVE: CVE-2024-26664

commit 4e440abc894585a34c2904a32cd54af1742311b3
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Feb 2 17:21:34 2024 +0800

    hwmon: (coretemp) Fix out-of-bounds memory access

    Fix a bug that pdata->cpu_map[] is set before out-of-bounds check.
    The problem might be triggered on systems with more than 128 cores per
    package.

    Fixes: 7108b80a542b ("hwmon/coretemp: Handle large core ID value")
    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240202092144.71180-2-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2024-04-25 15:23:26 -04:00
David Arcari c08a5a37c8 hwmon: (coretemp) Fix potentially truncated sysfs attribute name
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit bbfff736d30e5283ad09e748caff979d75ddef7f
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Wed Oct 25 20:23:16 2023 +0800

    hwmon: (coretemp) Fix potentially truncated sysfs attribute name

    When build with W=1 and "-Werror=format-truncation", below error is
    observed in coretemp driver,

       drivers/hwmon/coretemp.c: In function 'create_core_data':
    >> drivers/hwmon/coretemp.c:393:34: error: '%s' directive output may be truncated writing likely 5 or more bytes into a region of size between 3 and 13 [-Werror=format-truncation=]
         393 |                          "temp%d_%s", attr_no, suffixes[i]);
             |                                  ^~
       drivers/hwmon/coretemp.c:393:26: note: assuming directive output of 5 bytes
         393 |                          "temp%d_%s", attr_no, suffixes[i]);
             |                          ^~~~~~~~~~~
       drivers/hwmon/coretemp.c:392:17: note: 'snprintf' output 7 or more bytes (assuming 22) into a destination of size 19
         392 |                 snprintf(tdata->attr_name[i], CORETEMP_NAME_LENGTH,
             |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         393 |                          "temp%d_%s", attr_no, suffixes[i]);
             |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       cc1: all warnings being treated as errors

    Given that
    1. '%d' could take 10 charactors,
    2. '%s' could take 10 charactors ("crit_alarm"),
    3. "temp", "_" and the NULL terminator take 6 charactors,
    fix the problem by increasing CORETEMP_NAME_LENGTH to 28.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Fixes: 7108b80a542b ("hwmon/coretemp: Handle large core ID value")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202310200443.iD3tUbbK-lkp@intel.com/
    Link: https://lore.kernel.org/r/20231025122316.836400-1-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:10 -05:00
David Arcari 68a7d05de4 hwmon: (coretemp) Delete an obsolete comment
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit a2930f6dc90f07b2d956cab5f98b594b16918132
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Thu Mar 30 18:33:46 2023 +0800

    hwmon: (coretemp) Delete an obsolete comment

    The refinement of tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET
    has been changed for several times.

    Now, the raw value from MSR is used without refinement. Thus remove the
    obsolete comment.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20230330103346.6044-2-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:10 -05:00
David Arcari 8235808be8 hwmon: (coretemp) Delete tjmax debug message
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit 6c2b659913ad9c70c30050efc3e287fd0869012a
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Thu Mar 30 18:33:45 2023 +0800

    hwmon: (coretemp) Delete tjmax debug message

    After commit c0c67f8761ce ("hwmon: (coretemp) Add support for dynamic
    tjmax"), tjmax value is retrieved from MSR every time the temperature is
    read.
    This means that, with debug message enabled, the tjmax debug message is
    printed out for every single temperature read for any CPU. This spams
    the syslog.

    Ideally, as tjmax is package scope unique, the debug message should show
    once when tjmax is changed for one package. But this requires inventing
    some new per-package data in the coretemp driver, and this is overkill.

    To keep the code simple, delete the tjmax debug message.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20230330103346.6044-1-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:10 -05:00
David Arcari 85324748aa hwmon: (coretemp) Simplify platform device handling
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit 6d03bbff456befeccdd4d663177c4d6c75d0c4ff
Author: Robin Murphy <robin.murphy@arm.com>
Date:   Tue Jan 3 12:46:20 2023 +0100

    hwmon: (coretemp) Simplify platform device handling

    Coretemp's platform driver is unconventional. All the real work is done
    globally by the initcall and CPU hotplug notifiers, while the "driver"
    effectively just wraps an allocation and the registration of the hwmon
    interface in a long-winded round-trip through the driver core.  The whole
    logic of dynamically creating and destroying platform devices to bring
    the interfaces up and down is error prone, since it assumes
    platform_device_add() will synchronously bind the driver and set drvdata
    before it returns, thus results in a NULL dereference if drivers_autoprobe
    is turned off for the platform bus. Furthermore, the unusual approach of
    doing that from within a CPU hotplug notifier, already commented in the
    code that it deadlocks suspend, also causes lockdep issues for other
    drivers or subsystems which may want to legitimately register a CPU
    hotplug notifier from a platform bus notifier.

    All of these issues can be solved by ripping this unusual behaviour out
    completely, simply tying the platform devices to the lifetime of the
    module itself, and directly managing the hwmon interfaces from the
    hotplug notifiers. There is a slight user-visible change in that
    /sys/bus/platform/drivers/coretemp will no longer appear, and
    /sys/devices/platform/coretemp.n will remain present if package n is
    hotplugged off, but hwmon users should really only be looking for the
    presence of the hwmon interfaces, whose behaviour remains unchanged.

    Link: https://lore.kernel.org/lkml/20220922101036.87457-1-janusz.krzysztofik@linux.intel.com/
    Link: https://gitlab.freedesktop.org/drm/intel/issues/6641
    Signed-off-by: Robin Murphy <robin.murphy@arm.com>
    Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
    Link: https://lore.kernel.org/r/20230103114620.15319-1-janusz.krzysztofik@linux.intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:10 -05:00
David Arcari 94c3632f66 hwmon: (coretemp) Add support for dynamic ttarget
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit fae30e3c203e0f854d0420b50e54e31a75b6a8a4
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Sun Nov 13 23:31:45 2022 +0800

    hwmon: (coretemp) Add support for dynamic ttarget

    Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at
    runtime when the Intel SST-PP (Intel Speed Select Technology -
    Performance Profile) level is changed. As a result, the ttarget value
    also becomes dyamic.

    Improve the code to always get updated ttarget value.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20221113153145.32696-4-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
David Arcari c50343c632 hwmon: (coretemp) Add support for dynamic tjmax
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit c0c67f8761cec1fe36c21d85b1a5400ea7ac30cd
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Sun Nov 13 23:31:44 2022 +0800

    hwmon: (coretemp) Add support for dynamic tjmax

    Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at
    runtime when the Intel SST-PP (Intel Speed Select Technology -
    Performance Profile) level is changed.

    Improve the code to always use updated tjmax when it can be retrieved
    from MSR_IA32_TEMPERATURE_TARGET.

    When tjmax can not be retrieved from MSR_IA32_TEMPERATURE_TARGET, still
    follow the previous logic and always use a static tjmax value.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20221113153145.32696-3-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
David Arcari e233745612 hwmon: (coretemp) rearrange tjmax handing code
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit 2bc0e6d07ee50497043112d677fdd34327cf025c
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Sun Nov 13 23:31:43 2022 +0800

    hwmon: (coretemp) rearrange tjmax handing code

    Rearrange the tjmax handling code so that it can be used directly in
    the sysfs attribute callbacks without forward declarations.

    No functional change in this patch.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20221113153145.32696-2-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
David Arcari 1084ed0ba2 hwmon: (coretemp) Remove obsolete temp_data->valid
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit 5c0e64dde80ffe78d930db4e38e6218598aecd85
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Tue Nov 8 15:50:49 2022 +0800

    hwmon: (coretemp) Remove obsolete temp_data->valid

    Checking for the valid bit of IA32_THERM_STATUS is removed in commit
    bf6ea084eb ("hwmon: (coretemp) Do not return -EAGAIN for low
    temperatures"), and temp_data->valid is set and never cleared when the
    temperature has been read once.

    Remove the obsolete temp_data->valid field.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://lore.kernel.org/r/20221108075051.5139-2-rui.zhang@intel.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
David Arcari ab5a19d95a hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()
JIRA: https://issues.redhat.com/browse/RHEL-19759

commit 7dec14537c5906b8bf40fd6fd6d9c3850f8df11d
Author: Yang Yingliang <yangyingliang@huawei.com>
Date:   Fri Nov 18 17:33:03 2022 +0800

    hwmon: (coretemp) fix pci device refcount leak in nv1a_ram_new()

    As comment of pci_get_domain_bus_and_slot() says, it returns
    a pci device with refcount increment, when finish using it,
    the caller must decrement the reference count by calling
    pci_dev_put(). So call it after using to avoid refcount leak.

    Fixes: 14513ee696 ("hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary")
    Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
    Link: https://lore.kernel.org/r/20221118093303.214163-1-yangyingliang@huawei.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
David Arcari 66f7f0a388 hwmon: cleanup non-bool "valid" data fields
JIRA: https://issues.redhat.com/browse/RHEL-19759
Conflicts: some drift in tmp421.c

commit 952a11ca32a6046ab86bf885a7805c935f71d5c8
Author: Paul Fertser <fercerpav@gmail.com>
Date:   Fri Sep 24 22:52:02 2021 +0300

    hwmon: cleanup non-bool "valid" data fields

    We have bool so use it consistently in all the drivers.

    The following Coccinelle script was used:

    @@
    identifier T;
    type t = { char, int };
    @@
    struct T {
    ...
    -       t valid;
    +       bool valid;
    ...
    }

    @@
    identifier v;
    @@
    (
    - v->valid = 0
    + v->valid = false
    |
    - v->valid = 1
    + v->valid = true
    )

    followed by sed to fixup the comments:
    sed '/bool valid;/{s/!=0/true/;s/zero/false/}'

    Few whitespace changes were fixed manually. All modified drivers were
    compile-tested.

    Signed-off-by: Paul Fertser <fercerpav@gmail.com>
    Link: https://lore.kernel.org/r/20210924195202.27917-1-fercerpav@gmail.com
    [groeck: Fixed up 'u8 valid' to 'boool valid' in atxp1.c]
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: David Arcari <darcari@redhat.com>
2023-12-20 07:38:09 -05:00
Marcelo Tosatti 7d648a0e75 hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs
commit 0f8b916bc5b5d74cacef2b616b04db10633b8105
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182083
Testing: Tested by QE

The coretemp driver uses rdmsr_on_cpu calls to read
MSR_IA32_PACKAGE_THERM_STATUS/MSR_IA32_THERM_STATUS registers,
which contain information about current core temperature.

For certain low latency applications, the RDMSR interruption exceeds
the applications requirements.

So do not create core files in sysfs, for CPUs which have
isolation and nohz_full enabled.

Temperature information from the housekeeping cores should be
sufficient to infer die temperature.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Link: https://lore.kernel.org/r/Y5zT6B1mY9/pnwJV@tpad
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-03-27 11:22:34 -03:00
David Arcari a0fea88e2b hwmon/coretemp: Handle large core ID value
Bugzilla: https://bugzilla.redhat.com/2159554

commit 7108b80a542b9d65e44b36d64a700a83658c0b73
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Oct 14 17:01:45 2022 +0800

    hwmon/coretemp: Handle large core ID value

    The coretemp driver supports up to a hard-coded limit of 128 cores.

    Today, the driver can not support a core with an ID above that limit.
    Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
    may be sparse and they may be large.

    Update the driver to map arbitrary core ID numbers into appropriate
    array indexes so that 128 cores can be supported, no matter the encoding
    of core ID's.

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Acked-by: Len Brown <len.brown@intel.com>
    Acked-by: Guenter Roeck <linux@roeck-us.net>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com

Signed-off-by: David Arcari <darcari@redhat.com>
2023-01-10 14:21:03 -05:00
Phil Auld f1215e1da1 hwmon: (coretemp) Check for null before removing sysfs attrs
Bugzilla: https://bugzilla.redhat.com/2101449

commit a89ff5f5cc64b9fe7a992cf56988fd36f56ca82a
Author: Phil Auld <pauld@redhat.com>
Date:   Thu Nov 17 11:23:13 2022 -0500

    hwmon: (coretemp) Check for null before removing sysfs attrs

    If coretemp_add_core() gets an error then pdata->core_data[indx]
    is already NULL and has been kfreed. Don't pass that to
    sysfs_remove_group() as that will crash in sysfs_remove_group().

    [Shortened for readability]
    [91854.020159] sysfs: cannot create duplicate filename '/devices/platform/coretemp.0/hwmon/hwmon2/temp20_label'
    <cpu offline>
    [91855.126115] BUG: kernel NULL pointer dereference, address: 0000000000000188
    [91855.165103] #PF: supervisor read access in kernel mode
    [91855.194506] #PF: error_code(0x0000) - not-present page
    [91855.224445] PGD 0 P4D 0
    [91855.238508] Oops: 0000 [#1] PREEMPT SMP PTI
    ...
    [91855.342716] RIP: 0010:sysfs_remove_group+0xc/0x80
    ...
    [91855.796571] Call Trace:
    [91855.810524]  coretemp_cpu_offline+0x12b/0x1dd [coretemp]
    [91855.841738]  ? coretemp_cpu_online+0x180/0x180 [coretemp]
    [91855.871107]  cpuhp_invoke_callback+0x105/0x4b0
    [91855.893432]  cpuhp_thread_fun+0x8e/0x150
    ...

    Fix this by checking for NULL first.

    Signed-off-by: Phil Auld <pauld@redhat.com>
    Cc: linux-hwmon@vger.kernel.org
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Jean Delvare <jdelvare@suse.com>
    Cc: Guenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/r/20221117162313.3164803-1-pauld@redhat.com
    Fixes: 199e0de7f5 ("hwmon: (coretemp) Merge pkgtemp with coretemp")
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-12-11 17:04:37 -05:00
Thomas Gleixner 5cfc7ac7c1 hwmon: Convert to new X86 CPU match macros
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131509.859324598@linutronix.de
2020-03-24 21:33:36 +01:00
Wenwen Wang e027a2dea5 hwmon (coretemp) Fix a memory leak bug
In coretemp_init(), 'zone_devices' is allocated through kcalloc().
However, it is not deallocated in the following execution if
platform_driver_register() fails, leading to a memory leak. To fix this
issue, introduce the 'outzone' label to free 'zone_devices' before
returning the error.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Link: https://lore.kernel.org/r/1566248402-6538-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-08-31 08:04:57 -07:00
Linus Torvalds 222a21d295 Merge branch 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology updates from Ingo Molnar:
 "Implement multi-die topology support on Intel CPUs and expose the die
  topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.

  These changes should have no effect on the kernel's existing
  understanding of topologies, i.e. there should be no behavioral impact
  on cache, NUMA, scheduler, perf and other topologies and overall
  system performance"

* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support
  perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support
  hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
  thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages
  perf/x86/intel/cstate: Support multi-die/package
  perf/x86/intel/rapl: Support multi-die/package
  perf/x86/intel/uncore: Support multi-die/package
  topology: Create core_cpus and die_cpus sysfs attributes
  topology: Create package_cpus sysfs attribute
  hwmon/coretemp: Support multi-die/package
  powercap/intel_rapl: Update RAPL domain name and debug messages
  thermal/x86_pkg_temp_thermal: Support multi-die/package
  powercap/intel_rapl: Support multi-die/package
  powercap/intel_rapl: Simplify rapl_find_package()
  x86/topology: Define topology_logical_die_id()
  x86/topology: Define topology_die_id()
  cpu/topology: Export die_id
  x86/topology: Create topology_max_die_per_package()
  x86/topology: Add CPUID.1F multi-die/package support
2019-07-08 18:28:44 -07:00
Thomas Gleixner 935912c538 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 164
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation version 2 of the license this program
  is distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation inc 51 franklin street fifth floor boston ma
  02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 12 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.745497013@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:38 -07:00
Len Brown 835896a59b hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
Syntax update only -- no logical or functional change.

In response to the new multi-die/package changes, update variable names to
use the more generic thermal "zone" terminology, instead of "package", as
the zones can refer to either packages or die.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/facecfd3525d55c2051f63a7ec709aeb03cc1dc1.1557769318.git.len.brown@intel.com
2019-05-23 10:08:36 +02:00
Zhang Rui cfcd82e632 hwmon/coretemp: Support multi-die/package
Package temperature sensors are actually implemented in hardware per-die.

Update coretemp to be "die-aware", so it can expose mulitple sensors per
package, instead of just one.  No change to single-die/package systems.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-hwmon@vger.kernel.org
Link: https://lkml.kernel.org/r/ec2868f35113a01ff72d9041e0b97fc6a1c7df84.1557769318.git.len.brown@intel.com
2019-05-23 10:08:33 +02:00
Guenter Roeck 0cd709d0dd hwmon: (coretemp) Replace S_<PERMS> with octal values
Replace S_<PERMS> with octal values.

The conversion was done automatically with coccinelle. The semantic patches
and the scripts used to generate this commit log are available at
https://github.com/groeck/coccinelle-patches/hwmon/.

This patch does not introduce functional changes. It was verified by
compiling the old and new files and comparing text and data sizes.

Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-12-16 15:13:41 -08:00
Kees Cook 6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds d4667ca142 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
 "Here's the latest set of Spectre and PTI related fixes and updates:

  Spectre:
   - Add entry code register clearing to reduce the Spectre attack
     surface
   - Update the Spectre microcode blacklist
   - Inline the KVM Spectre helpers to get close to v4.14 performance
     again.
   - Fix indirect_branch_prediction_barrier()
   - Fix/improve Spectre related kernel messages
   - Fix array_index_nospec_mask() asm constraint
   - KVM: fix two MSR handling bugs

  PTI:
   - Fix a paranoid entry PTI CR3 handling bug
   - Fix comments

  objtool:
   - Fix paranoid_entry() frame pointer warning
   - Annotate WARN()-related UD2 as reachable
   - Various fixes
   - Add Add Peter Zijlstra as objtool co-maintainer

  Misc:
   - Various x86 entry code self-test fixes
   - Improve/simplify entry code stack frame generation and handling
     after recent heavy-handed PTI and Spectre changes. (There's two
     more WIP improvements expected here.)
   - Type fix for cache entries

  There's also some low risk non-fix changes I've included in this
  branch to reduce backporting conflicts:

   - rename a confusing x86_cpu field name
   - de-obfuscate the naming of single-TLB flushing primitives"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  x86/entry/64: Fix CR3 restore in paranoid_exit()
  x86/cpu: Change type of x86_cache_size variable to unsigned int
  x86/spectre: Fix an error message
  x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
  selftests/x86/mpx: Fix incorrect bounds with old _sigfault
  x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
  x86/speculation: Add <asm/msr-index.h> dependency
  nospec: Move array_index_nospec() parameter checking into separate macro
  x86/speculation: Fix up array_index_nospec_mask() asm constraint
  x86/debug: Use UD2 for WARN()
  x86/debug, objtool: Annotate WARN()-related UD2 as reachable
  objtool: Fix segfault in ignore_unreachable_insn()
  selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
  selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
  selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
  selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
  selftests/x86/pkeys: Remove unused functions
  selftests/x86: Clean up and document sscanf() usage
  selftests/x86: Fix vDSO selftest segfault for vsyscall=none
  x86/entry/64: Remove the unused 'icebp' macro
  ...
2018-02-14 17:02:15 -08:00
Jia Zhang b399151cb4 x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
x86_mask is a confusing name which is hard to associate with the
processor's stepping.

Additionally, correct an indent issue in lib/cpu.c.

Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-15 01:15:52 +01:00
Sinan Kaya b9ccff233e hwmon: (coretemp) deprecate pci_get_bus_and_slot()
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.

Use pci_get_domain_bus_and_slot() with a domain number of 0 where we can't
extract the domain number. Other places, use the actual domain number from
the device.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-01-02 15:05:34 -08:00
Thomas Gleixner 90b4f30b6d hwmon: (coretemp) Handle frozen hotplug state correctly
The recent conversion to the hotplug state machine missed that the original
hotplug notifiers did not execute in the frozen state, which is used on
suspend on resume.

This does not matter on single socket machines, but on multi socket systems
this breaks when the device for a non-boot socket is removed when the last
CPU of that socket is brought offline. The device removal locks up the
machine hard w/o any debug output.

Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.

Thanks to Tommi for providing debug information patiently while I failed to
spot the obvious.

Fixes: e00ca5df37 ("hwmon: (coretemp) Convert to hotplug state machine")
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-05-14 07:49:32 -07:00
Thomas Gleixner 7126684605 hwmon: (coretemp) Simplify package management
Keeping track of the per package platform devices requires an extra object,
which is held in a linked list.

The maximum number of packages is known at init() time. So the extra object
and linked list management can be replaced by an array of platform device
pointers in which the per package devices pointers can be stored. Lookup
becomes a simple array lookup instead of a list walk.

The mutex protecting the list can be removed as well because the array is
only accessed from cpu hotplug callbacks which are already serialized.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:13 -08:00
Thomas Gleixner 2195c31b12 hwmon: (coretemp) Use proper error codes in cpu online callback
The cpu online callback returns success unconditionally even when the
device has no support, micro code mismatches or device allocation fails.
Only if CPU_HOTPLUG is disabled, the init function checks whether the
device list is empty and removes the driver.

This does not make sense. If CPU HOTPLUG is enabled then there is no point
to keep the driver around when it failed to initialize on the already
online cpus. The chance that not yet online CPUs will provide a functional
interface later is very close to zero.

Add proper error return codes, so the setup of the cpu hotplug states fails
when the device cannot be initialized and remove all the magic cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:11 -08:00
Thomas Gleixner e00ca5df37 hwmon: (coretemp) Convert to hotplug state machine
Install the callbacks via the state machine. Setup and teardown are handled
by the hotplug core.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-hwmon@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: rt@linuxtronix.de
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20161117183541.8588-5-bigeasy@linutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:10 -08:00
Thomas Gleixner 4b138cf73f hwmon: (coretemp) Avoid redundant lookups
No point in looking up the same thing over and over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:09 -08:00
Thomas Gleixner e1b370b640 hwmon: (coretemp) Simplify sibling management
The coretemp driver provides a sysfs interface per physical core. If
hyperthreading is enabled and one of the siblings goes offline the sysfs
interface is removed and then immeditately created again for the
sibling. The only difference of them is the target cpu for the
rdmsr_on_cpu() in the sysfs show functions.

It's way simpler to keep a cpumask of cpus which are active in a package
and only remove the interface when the last sibling goes offline. Otherwise
just move the target cpu for the sysfs show functions to the still online
sibling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:07 -08:00
Thomas Gleixner 723f573433 hwmon: (coretemp) Fixup target cpu for package when cpu is offlined
When a CPU is offlined nothing checks whether it is the target CPU for the
package temperature sysfs interface.

As a consequence all future readouts of the package temperature return
crap:

90000

which is Tjmax of that package.

Check whether the outgoing CPU is the target for the package and assign it
to some other still online CPU in the package. Protect the change against
the rdmsr_on_cpu() in show_crit_alarm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-12-09 21:54:06 -08:00
Lukasz Odzioba cc904f9cf2 hwmon: (coretemp) Increase limit of maximum core ID from 32 to 128.
A new limit selected arbitrarily as power of two greater than
required minimum for Xeon Phi processor (72 for Knights Landing).

Currently driver is not able to handle cores with core ID greater than 32.
Such attempt ends up with the following error in dmesg:
coretemp coretemp.0: Adding Core XXX failed

Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-10-14 07:57:14 -07:00
Bartosz Golaszewski 19a34eea4f coretemp: Replace cpu_sibling_mask() with topology_sibling_cpumask()
The former duplicates the functionality of the latter but is
neither documented nor arch-independent.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-4-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:15 +02:00
Rasmus Villemoes 1055b5f904 hwmon: (coretemp) Allow format checking
By extracting the only part that differs we can allow static checking
of the format string, and possibly save a little .rodata.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
[Guenter Roeck: continuation line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2015-03-09 09:59:35 -07:00
Wolfram Sang 2a1ed07718 hwmon: drop owner assignment from platform_drivers
A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-20 16:20:36 +02:00
Guenter Roeck c0940e95f7 Revert "hwmon: (coretemp) Refine TjMax detection"
This reverts commit 9fb6c9c73b.

Tjmax on some Intel CPUs is below 85 degrees C. One known example is
L5630 with Tjmax of 71 degrees C. There are other Xeon processors with
Tjmax of 70 or 80 degrees C. Also, the Intel IA32 System Programming
document states that the temperature target is in bits 23:16 of MSR 0x1a2
(MSR_TEMPERATURE_TARGET), which is 8 bits, not 7.

So even if turbostat uses similar checks to validate Tjmax, there is no
evidence that the checks are actually required. On the contrary, the
checks are known to cause problems and therefore need to be removed.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=75071.

Fixes: 9fb6c9c hwmon: (coretemp) Refine TjMax detection
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-05-01 04:07:52 -07:00
Linus Torvalds 467a9e1633 CPU hotplug notifiers registration fixes for 3.15-rc1
The purpose of this single series of commits from Srivatsa S Bhat (with
 a small piece from Gautham R Shenoy) touching multiple subsystems that use
 CPU hotplug notifiers is to provide a way to register them that will not
 lead to deadlocks with CPU online/offline operations as described in the
 changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
 of callback registration functions).
 
 The first three commits in the series introduce the API and document it
 and the rest simply goes through the users of CPU hotplug notifiers and
 converts them to using the new method.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
 4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
 KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
 BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
 rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
 j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
 ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
 lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
 IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
 cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
 StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
 sWl//rg76nb13dFjvF+q
 =SW7Q
 -----END PGP SIGNATURE-----

Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
 "The purpose of this single series of commits from Srivatsa S Bhat
  (with a small piece from Gautham R Shenoy) touching multiple
  subsystems that use CPU hotplug notifiers is to provide a way to
  register them that will not lead to deadlocks with CPU online/offline
  operations as described in the changelog of commit 93ae4f978c ("CPU
  hotplug: Provide lockless versions of callback registration
  functions").

  The first three commits in the series introduce the API and document
  it and the rest simply goes through the users of CPU hotplug notifiers
  and converts them to using the new method"

* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
  net/iucv/iucv.c: Fix CPU hotplug callback registration
  net/core/flow.c: Fix CPU hotplug callback registration
  mm, zswap: Fix CPU hotplug callback registration
  mm, vmstat: Fix CPU hotplug callback registration
  profile: Fix CPU hotplug callback registration
  trace, ring-buffer: Fix CPU hotplug callback registration
  xen, balloon: Fix CPU hotplug callback registration
  hwmon, via-cputemp: Fix CPU hotplug callback registration
  hwmon, coretemp: Fix CPU hotplug callback registration
  thermal, x86-pkg-temp: Fix CPU hotplug callback registration
  octeon, watchdog: Fix CPU hotplug callback registration
  oprofile, nmi-timer: Fix CPU hotplug callback registration
  intel-idle: Fix CPU hotplug callback registration
  clocksource, dummy-timer: Fix CPU hotplug callback registration
  drivers/base/topology.c: Fix CPU hotplug callback registration
  acpi-cpufreq: Fix CPU hotplug callback registration
  zsmalloc: Fix CPU hotplug callback registration
  scsi, fcoe: Fix CPU hotplug callback registration
  scsi, bnx2fc: Fix CPU hotplug callback registration
  scsi, bnx2i: Fix CPU hotplug callback registration
  ...
2014-04-07 14:55:46 -07:00
Srivatsa S. Bhat 3289705fe2 hwmon, coretemp: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the hwmon coretemp code by using this latter form of callback
registration.

Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:47 +01:00
Guenter Roeck d72d19c26c hwmon: (coretemp) Convert to use devm_hwmon_device_register_with_groups
Simplify code, reduce code size, and attach sysfs attributes to hwmon device.

For this driver, the only attribute created is the name attribute.
Other attributes are still created and removed dynamically as cores
are added or removed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
2014-03-03 08:01:05 -08:00
Guenter Roeck c503a811e4 hwmon: (coretemp) Allocate platform data with devm_kzalloc
This simplifies error handling.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
2014-03-03 08:01:05 -08:00
Guenter Roeck 1075305de4 hwmon: (coretemp) Use sysfs_create_group to create sysfs attributes
Instead of creating each attribute individually, use sysfs_create_group
to create all attributes for one core with a single call.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
2014-03-03 08:01:05 -08:00
Guenter Roeck bf6ea084eb hwmon: (coretemp) Do not return -EAGAIN for low temperatures
Some Intel CPUs do not set the 'valid' bit in IA32_THERM_STATUS if the
temperature is too low to be measured. This condition will not change until
the CPU is hot enough for its temperature to be measured. Returning an error
in such conditions is not very useful. Drop checking the valid bit and just
return the reported temperature instead.

Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:52 -08:00
Guenter Roeck 9fb6c9c73b hwmon: (coretemp) Refine TjMax detection
Intel's turbostat code uses only 7 bits from MSR_IA32_TEMPERATURE_TARGET to
read TjMax, and also only accepts it if the reported temperature is at least
85 degrees C. Play safe and do the same.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:31 -08:00
Guenter Roeck 347c16cfde hwmon: (coretemp) Add PCI device ID for CE41x0 CPUs
Since we now have to use PCI IDs to detect CPU types anyway, use this mechanism
to detect CE41x0 CPUs. Advantage is that it only requires a single entry and
covers all variants of CE41x0, including those unknown to us.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:31 -08:00
Guenter Roeck 14513ee696 hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary
Atom S12x0 CPUs are identified by the CPU host bridge ID. Add an override
table based on PCI IDs as well as code to detect it.

PCI access functions can now be called with PCI disabled, so unlike previous
attempts to use PCI IDs, the code no longer depends on it. If PCI is disabled,
the CPU will not be identified correctly. Since it is unlikely that anything
will work in this case, this is an acceptable limitation.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:30 -08:00