Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Rafael Aquini	fa01616b1d	percpu: scoped objcg protection JIRA: https://issues.redhat.com/browse/RHEL-27745 This patch is a backport of the following upstream commit: commit c63b835d0eafc956c43b8c6605708240ac52b8cd Author: Roman Gushchin <roman.gushchin@linux.dev> Date: Thu Oct 19 15:53:45 2023 -0700 percpu: scoped objcg protection Similar to slab and kmem, switch to a scope-based protection of the objcg pointer to avoid. Link: https://lkml.kernel.org/r/20231019225346.1822282-6-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-12-09 12:22:58 -05:00
Rafael Aquini	c2a3a026db	mm/percpu.c: print error message too if atomic alloc failed JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit f7d77dfc91f747f64cb00884fd6d7940c3b49fca Author: Baoquan He <bhe@redhat.com> Date: Fri Jul 28 11:02:55 2023 +0800 mm/percpu.c: print error message too if atomic alloc failed The variable 'err' is assgigned to an error message if atomic alloc failed, while it has no chance to be printed if is_atomic is true. Here change to print error message too if atomic alloc failed, while avoid to call dump_stack() if that case. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:08 -04:00
Rafael Aquini	781159954d	mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 7ee1e758bebe13d96217bcfd5230892ed44760e7 Author: Baoquan He <bhe@redhat.com> Date: Sat Jul 22 09:14:37 2023 +0800 mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit This removes the need of local varibale 'chunk', and optimize the code calling pcpu_alloc_first_chunk() to initialize reserved chunk and dynamic chunk to make it simpler. Signed-off-by: Baoquan He <bhe@redhat.com> [Dennis: reworded first chunk init comment] Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:07 -04:00
Rafael Aquini	09bb07696d	mm/percpu.c: remove redundant check JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 5b672085e70c2ea40f4c9d6a23848079bf0ff700 Author: Baoquan He <bhe@redhat.com> Date: Fri Jul 21 21:17:58 2023 +0800 mm/percpu.c: remove redundant check The conditional check "(ai->dyn_size < PERCPU_DYNAMIC_EARLY_SIZE) has covered the check '(!ai->dyn_size)'. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:07 -04:00
Rafael Aquini	16893a5375	mm/percpu: Remove some local variables in pcpu_populate_pte JIRA: https://issues.redhat.com/browse/RHEL-27743 This patch is a backport of the following upstream commit: commit 41fd59b7f9bdde2a473450680411c2016017b992 Author: Bibo Mao <maobibo@loongson.cn> Date: Wed Jul 12 11:16:20 2023 +0800 mm/percpu: Remove some local variables in pcpu_populate_pte In function pcpu_populate_pte there are already variable defined, it can be reused for later use, here remove duplicated local variables. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Rafael Aquini <raquini@redhat.com>	2024-10-01 11:20:06 -04:00
Aristeu Rozanski	1c1f6235c1	mm: memcontrol: rename memcg_kmem_enabled() JIRA: https://issues.redhat.com/browse/RHEL-27740 Tested: by me commit f7a449f779608efe1941a0e0c4bd7b5f57000be7 Author: Roman Gushchin <roman.gushchin@linux.dev> Date: Mon Feb 13 11:29:22 2023 -0800 mm: memcontrol: rename memcg_kmem_enabled() Currently there are two kmem-related helper functions with a confusing semantics: memcg_kmem_enabled() and mem_cgroup_kmem_disabled(). The problem is that an obvious expectation memcg_kmem_enabled() == !mem_cgroup_kmem_disabled(), can be false. mem_cgroup_kmem_disabled() is similar to mem_cgroup_disabled(): it returns true only if CONFIG_MEMCG_KMEM is not set or the kmem accounting is disabled using a boot time kernel option "cgroup.memory=nokmem". It never changes the value dynamically. memcg_kmem_enabled() is different: it always returns false until the first non-root memory cgroup will get online (assuming the kernel memory accounting is enabled). It's goal is to improve the performance on systems without the cgroupfs mounted/memory controller enabled or on the systems with only the root memory cgroup. To make things more obvious and avoid potential bugs, let's rename memcg_kmem_enabled() to memcg_kmem_online(). Link: https://lkml.kernel.org/r/20230213192922.1146370-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Dennis Zhou <dennis@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>	2024-04-29 14:33:23 -04:00
Lucas Zampieri	6f794c0e0b	Merge: MM update to v6.2 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3738 JIRA: https://issues.redhat.com/browse/RHEL-27739 Depends: !3662 Dropped Patches and the reason they were dropped: Needs to be evaluated by the FS team: 138060ba92b3 ("fs: pass dentry to set acl method") 3b4c7bc01727 ("xattr: use rbtree for simple_xattrs") Needs to be evaluated by the NVME team: 4003f107fa2e ("mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages") Needs to be evaluated by the ZRAM team: 7c2af309abd2 ("zram: add size class equals check into recompression") Signed-off-by: Audra Mitchell <audra@redhat.com> Approved-by: Rafael Aquini <aquini@redhat.com> Approved-by: Chris von Recklinghausen <crecklin@redhat.com> Approved-by: Jocelyn Falempe <jfalempe@redhat.com> Approved-by: David Arcari <darcari@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: David Airlie <airlied@redhat.com> Merged-by: Lucas Zampieri <lzampier@redhat.com>	2024-04-17 10:14:56 -03:00
Audra Mitchell	a23585f50b	mm/percpu.c: remove the lcm code since block size is fixed at page size JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 3289e0533e70aafa9fb6d128fd4452db1b8befe8 Author: Baoquan He <bhe@redhat.com> Date: Mon Oct 24 16:14:33 2022 +0800 mm/percpu.c: remove the lcm code since block size is fixed at page size Since commit `b239f7daf5` ("percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE"), the PCPU_BITMAP_BLOCK_SIZE has been set to page size fixedly. So the lcm code in pcpu_alloc_first_chunk() doesn't make sense any more, clean it up. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:50 -04:00
Audra Mitchell	a4b0f4aadc	mm/percpu: replace the goto with break JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 83d261fc9e5fb03e8c32e365ca4ee53952611a2b Author: Baoquan He <bhe@redhat.com> Date: Mon Oct 24 16:14:32 2022 +0800 mm/percpu: replace the goto with break In function pcpu_reclaim_populated(), the line of goto jumping is unnecessary since the label 'end_chunk' is near the end of the for loop, use break instead. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:50 -04:00
Audra Mitchell	443bfa2d5b	mm/percpu: add comment to state the empty populated pages accounting JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 73046f8d31701c379f6db899cb09ba70a3285143 Author: Baoquan He <bhe@redhat.com> Date: Tue Oct 25 11:45:16 2022 +0800 mm/percpu: add comment to state the empty populated pages accounting When allocating an area from a chunk, pcpu_block_update_hint_alloc() is called to update chunk metadata, including chunk's and global nr_empty_pop_pages. However, if the allocation is not atomic, some blocks may not be populated with pages yet, while we still subtract the number here. The number of pages will be added back with pcpu_chunk_populated() when populating pages. Adding code comment to make that more understandable. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:50 -04:00
Audra Mitchell	a432c1d810	mm/percpu: Update the code comment when creating new chunk JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit e04cb6976340d5ebf2b28ad91bf6a13a285aa566 Author: Baoquan He <bhe@redhat.com> Date: Mon Oct 24 16:14:30 2022 +0800 mm/percpu: Update the code comment when creating new chunk The lock pcpu_alloc_mutex taking code has been moved to the beginning of pcpu_allo() if it's non atomic allocation. So the code comment above above pcpu_create_chunk() callsite need be updated. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:49 -04:00
Audra Mitchell	91e0cae202	mm/percpu: use list_first_entry_or_null in pcpu_reclaim_populated() JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit c1f6688d35d47ca11200789b000b3b20f5ecdbd9 Author: Baoquan He <bhe@redhat.com> Date: Tue Oct 25 11:11:45 2022 +0800 mm/percpu: use list_first_entry_or_null in pcpu_reclaim_populated() To replace list_empty()/list_first_entry() pair to simplify code. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:49 -04:00
Audra Mitchell	10f60902d9	mm/percpu: remove unused pcpu_map_extend_chunks JIRA: https://issues.redhat.com/browse/RHEL-27739 This patch is a backport of the following upstream commit: commit 5a7d596a05dddd09c44ae462f881491cf87ed120 Author: Baoquan He <bhe@redhat.com> Date: Mon Oct 24 16:14:28 2022 +0800 mm/percpu: remove unused pcpu_map_extend_chunks Since commit `40064aeca3` ("percpu: replace area map allocator with bitmap"), it is unneeded. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Audra Mitchell <audra@redhat.com>	2024-04-09 09:42:49 -04:00
Artem Savkov	4ab5be6999	mm/percpu.c: introduce pcpu_alloc_size() JIRA: https://issues.redhat.com/browse/RHEL-23643 commit b460bc8302f222d346f0c15bba980eb8c36d6278 Author: Hou Tao <houtao1@huawei.com> Date: Fri Oct 20 21:31:57 2023 +0800 mm/percpu.c: introduce pcpu_alloc_size() Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu area. It will be used by bpf memory allocator in the following patches. BPF memory allocator maintains per-cpu area caches for multiple area sizes and its free API only has the to-be-freed per-cpu pointer, so it needs the size of dynamic per-cpu area to select the corresponding cache when bpf program frees the dynamic per-cpu pointer. Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231020133202.4043247-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Artem Savkov <asavkov@redhat.com>	2024-03-27 10:27:55 +01:00
Artem Savkov	513a6387a1	mm/percpu.c: don't acquire pcpu_lock for pcpu_chunk_addr_search() JIRA: https://issues.redhat.com/browse/RHEL-23643 commit 394e6869f0185e89cb815db29bf819474df858ae Author: Hou Tao <houtao1@huawei.com> Date: Fri Oct 20 21:31:56 2023 +0800 mm/percpu.c: don't acquire pcpu_lock for pcpu_chunk_addr_search() There is no need to acquire pcpu_lock for pcpu_chunk_addr_search(): 1) both pcpu_first_chunk & pcpu_reserved_chunk must have been initialized before the invocation of free_percpu(). 2) The dynamically-created chunk must be valid before the per-cpu pointers allocated from it are freed. So acquire pcpu_lock() after the invocation of pcpu_chunk_addr_search(). Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231020133202.4043247-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Artem Savkov <asavkov@redhat.com>	2024-03-27 10:27:55 +01:00
Chris von Recklinghausen	c766891728	percpu: improve percpu_alloc_percpu event trace Bugzilla: https://bugzilla.redhat.com/2160210 commit f67bed134a053663852a1a3ab1b3223bfc2104a2 Author: Vasily Averin <vvs@openvz.org> Date: Thu May 12 20:23:07 2022 -0700 percpu: improve percpu_alloc_percpu event trace Add call_site, bytes_alloc and gfp_flags fields to the output of the percpu_alloc_percpu ftrace event: mkdir-4393 [001] 169.334788: percpu_alloc_percpu: call_site=mem_cgroup_css_alloc+0xa6 reserved=0 is_atomic=0 size=2408 align=8 base_addr=0xffffc7117fc00000 off=402176 ptr=0x3dc867a62300 bytes_alloc=14448 gfp_flags=GFP_KERNEL_ACCOUNT This is required to track memcg-accounted percpu allocations. Link: https://lkml.kernel.org/r/a07be858-c8a3-7851-9086-e3262cbcf707@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2023-03-24 11:19:08 -04:00
Waiman Long	470c25f269	mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2151065 commit a317ebccaa3609917a2c021af870cf3fa607ab0c Author: Patrick Wang <patrick.wang.shcn@gmail.com> Date: Tue, 5 Jul 2022 19:31:58 +0800 mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() Kmemleak recently added a rbtree to store the objects allocted with physical address. Those objects can't be freed with kmemleak_free(). According to the comments, percpu allocations are tracked by kmemleak separately. Kmemleak_free() was used to avoid the unnecessary tracking. If kmemleak_free() fails, those objects would be scanned by kmemleak, which is unnecessary but shouldn't lead to other effects. Use kmemleak_ignore_phys() instead of kmemleak_free() for those objects. Link: https://lkml.kernel.org/r/20220705113158.127600-1-patrick.wang.shcn@gmail.com Fixes: 0c24e061196c ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA") Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com>	2023-02-07 14:19:38 -05:00
Chris von Recklinghausen	3654bc9b37	mm: percpu: add generic pcpu_populate_pte() function Bugzilla: https://bugzilla.redhat.com/2120352 commit 20c035764626c56c4f6514936b9ee4be0f4cd962 Author: Kefeng Wang <wangkefeng.wang@huawei.com> Date: Wed Jan 19 18:07:53 2022 -0800 mm: percpu: add generic pcpu_populate_pte() function With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation. Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:42 -04:00
Chris von Recklinghausen	18cad680eb	mm: percpu: add generic pcpu_fc_alloc/free funciton Bugzilla: https://bugzilla.redhat.com/2120352 commit 23f917169ef157aa7a6bf80d8c4aad6f1282852c Author: Kefeng Wang <wangkefeng.wang@huawei.com> Date: Wed Jan 19 18:07:49 2022 -0800 mm: percpu: add generic pcpu_fc_alloc/free funciton With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture. Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:42 -04:00
Chris von Recklinghausen	983533c507	mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef Bugzilla: https://bugzilla.redhat.com/2120352 commit 1ca3fb3abd2b615c4b61728de545760a6e2c2d8b Author: Kefeng Wang <wangkefeng.wang@huawei.com> Date: Wed Jan 19 18:07:45 2022 -0800 mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch. Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:42 -04:00
Chris von Recklinghausen	adf3d212d2	bitmap: unify find_bit operations Bugzilla: https://bugzilla.redhat.com/2120352 commit ec288a2cf7ca40a939316b6df206ab845bb112d1 Author: Yury Norov <yury.norov@gmail.com> Date: Sat Aug 14 14:17:11 2021 -0700 bitmap: unify find_bit operations bitmap_for_each_{set,clear}_region() are similar to for_each_bit() macros in include/linux/find.h, but interface and implementation of them are different. This patch adds for_each_bitrange() macros and drops unused bitmap_*_region() API in sake of unification. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Dennis Zhou <dennis@kernel.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:41 -04:00
Chris von Recklinghausen	b027f02790	mm/percpu: micro-optimize pcpu_is_populated() Bugzilla: https://bugzilla.redhat.com/2120352 commit 801a57365fc836d7ec866e2069d0b21d79925c1e Author: Yury Norov <yury.norov@gmail.com> Date: Sat Aug 14 14:17:10 2021 -0700 mm/percpu: micro-optimize pcpu_is_populated() bitmap_next_clear_region() calls find_next_zero_bit() and find_next_bit() sequentially to find a range of clear bits. In case of pcpu_is_populated() there's a chance to return earlier if bitmap has all bits set. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:41 -04:00
Chris von Recklinghausen	b36217e840	mm: memcg/percpu: account extra objcg space to memory cgroups Bugzilla: https://bugzilla.redhat.com/2120352 commit 8c57c07741bf28e7d867f1200aa80120b8ca663e Author: Qi Zheng <zhengqi.arch@bytedance.com> Date: Fri Jan 14 14:09:12 2022 -0800 mm: memcg/percpu: account extra objcg space to memory cgroups Similar to slab memory allocator, for each accounted percpu object there is an extra space which is used to store obj_cgroup membership. Charge it too. [akpm@linux-foundation.org: fix layout] Link: https://lkml.kernel.org/r/20211126040606.97836-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>	2022-10-12 07:27:41 -04:00
Al Stone	d1fd9d18f4	memblock: use memblock_free for freeing virtual pointers Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840 Tested: This is one of a series of patch sets to enable Arm SystemReady IR support in the kernel for NXP i.MX8 platforms. At this stage, this has been tested by ensuring we can survive the CI/CD loop -- i.e., that we have not broken anything else, and a simple boot test. When sufficient drivers have been brought in for i.MX8M, we will be able to run further tests. Conflicts: init/main.c This patch is being applied out of order, but is a simple function name replacement, so applied manually. commit 4421cca0a3e4833b3bf0f20de98eb580ab8c7290 Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:22 2021 -0700 memblock: use memblock_free for freeing virtual pointers Rename memblock_free_ptr() to memblock_free() and use memblock_free() when freeing a virtual pointer so that memblock_free() will be a counterpart of memblock_alloc() The callers are updated with the below semantic patch and manual addition of (void *) casting to pointers that are represented by unsigned long variables. @@ identifier vaddr; expression size; @@ ( - memblock_phys_free(__pa(vaddr), size); + memblock_free(vaddr, size); \| - memblock_free_ptr(vaddr, size); + memblock_free(vaddr, size); ) [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 4421cca0a3e4833b3bf0f20de98eb580ab8c7290) Signed-off-by: Al Stone <ahs3@redhat.com>	2022-07-01 17:07:00 -06:00
Al Stone	14289d8c8f	memblock: rename memblock_free to memblock_phys_free Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840 Tested: This is one of a series of patch sets to enable Arm SystemReady IR support in the kernel for NXP i.MX8 platforms. At this stage, this has been tested by ensuring we can survive the CI/CD loop -- i.e., that we have not broken anything else, and a simple boot test. When sufficient drivers have been brought in for i.MX8M, we will be able to run further tests. Conflicts: arch/s390/kernel/setup.c arch/s390/kernel/smp.c These have been modified in ways that no longer strictly match the upstream code, throwing off the auto-merge; this is a simple function name replacement, however, so easily done manually instead. commit 3ecc68349bbab6bff1d12cbc7951ca6019b2faf6 Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:19 2021 -0700 memblock: rename memblock_free to memblock_phys_free Since memblock_free() operates on a physical range, make its name reflect it and rename it to memblock_phys_free(), so it will be a logical counterpart to memblock_phys_alloc(). The callers are updated with the below semantic patch: @@ expression addr; expression size; @@ - memblock_free(addr, size); + memblock_phys_free(addr, size); Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 3ecc68349bbab6bff1d12cbc7951ca6019b2faf6) Signed-off-by: Al Stone <ahs3@redhat.com>	2022-07-01 17:06:59 -06:00
Al Stone	3b2e45e437	memblock: drop memblock_free_early_nid() and memblock_free_early() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071840 Tested: This is one of a series of patch sets to enable Arm SystemReady IR support in the kernel for NXP i.MX8 platforms. At this stage, this has been tested by ensuring we can survive the CI/CD loop -- i.e., that we have not broken anything else, and a simple boot test. When sufficient drivers have been brought in for i.MX8M, we will be able to run further tests. commit fa27717110ae51b9b9013ced0b5143888257bb79 Author: Mike Rapoport <rppt@kernel.org> Date: Fri Nov 5 13:43:13 2021 -0700 memblock: drop memblock_free_early_nid() and memblock_free_early() memblock_free_early_nid() is unused and memblock_free_early() is an alias for memblock_free(). Replace calls to memblock_free_early() with calls to memblock_free() and remove memblock_free_early() and memblock_free_early_nid(). Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fa27717110ae51b9b9013ced0b5143888257bb79) Signed-off-by: Al Stone <ahs3@redhat.com>	2022-07-01 17:06:59 -06:00
Rafael Aquini	3e417b25d5	percpu: remove export of pcpu_base_addr Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 3843c50a782c397422765cf0839a95e75e523229 Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Tue Sep 7 19:57:27 2021 -0700 percpu: remove export of pcpu_base_addr This is not needed by any modules, so remove the export. Link: https://lkml.kernel.org/r/20210722185814.504541-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:43:30 -05:00
Rafael Aquini	55ccb8623f	mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396 This patch is a backport of the following upstream commit: commit 319814504992f51ed17af60edb1a237ada1892e8 Author: Jing Xiangfeng <jingxiangfeng@huawei.com> Date: Thu Sep 2 15:01:00 2021 -0700 mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() Commit `b239f7daf5` ("percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE") removed the parameter 'for_alloc', so remove this comment. Link: https://lkml.kernel.org/r/1630576043-21367-1-git-send-email-jingxiangfeng@huawei.com Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rafael Aquini <aquini@redhat.com>	2021-11-29 11:42:32 -05:00
Dennis Zhou	93274f1dd6	percpu: flush tlb in pcpu_reclaim_populated() Prior to "percpu: implement partial chunk depopulation", pcpu_depopulate_chunk() was called only on the destruction path. This meant the virtual address range was on its way back to vmalloc which will handle flushing the tlbs for us. However, with pcpu_reclaim_populated(), we are now calling pcpu_depopulate_chunk() during the active lifecycle of a chunk. Therefore, we need to flush the tlb as well otherwise we can end up accessing the wrong page through an invalid tlb mapping as reported in [1]. [1] https://lore.kernel.org/lkml/20210702191140.GA3166599@roeck-us.net/ Fixes: `f183324133` ("percpu: implement partial chunk depopulation") Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-07-04 18:30:17 +00:00
Linus Torvalds	e267992f9e	Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: - percpu chunk depopulation - depopulate backing pages for chunks with empty pages when we exceed a global threshold without those pages. This lets us reclaim a portion of memory that would previously be lost until the full chunk would be freed (possibly never). - memcg accounting cleanup - previously separate chunks were managed for normal allocations and __GFP_ACCOUNT allocations. These are now consolidated which cleans up the code quite a bit. - a few misc clean ups for clang warnings * 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: optimize locking in pcpu_balance_workfn() percpu: initialize best_upa variable percpu: rework memcg accounting mm, memcg: introduce mem_cgroup_kmem_disabled() mm, memcg: mark cgroup_memory_nosocket, nokmem and noswap as __ro_after_init percpu: make symbol 'pcpu_free_slot' static percpu: implement partial chunk depopulation percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1 percpu: factor out pcpu_check_block_hint() percpu: split __pcpu_balance_workfn() percpu: fix a comment about the chunks ordering	2021-07-01 17:17:24 -07:00
Roman Gushchin	e4d777003a	percpu: optimize locking in pcpu_balance_workfn() pcpu_balance_workfn() unconditionally calls pcpu_balance_free(), pcpu_reclaim_populated(), pcpu_balance_populated() and pcpu_balance_free() again. Each call to pcpu_balance_free() and pcpu_reclaim_populated() will cause at least one acquisition of the pcpu_lock. So even if the balancing was scheduled because of a failed atomic allocation, pcpu_lock will be acquired at least 4 times. This obviously increases the contention on the pcpu_lock. To optimize the scheme let's grab the pcpu_lock on the upper level (in pcpu_balance_workfn()) and keep it generally locked for the whole duration of the scheduled work, but release conditionally to perform any slow operations like chunk (de)population and creation of new chunks. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-06-17 23:05:24 +00:00
Dennis Zhou	4829c791b2	percpu: initialize best_upa variable Tom reported this finding from clang 10's static analysis [1]. Due to the way the code is written, it will always see a successful loop iteration. Instead of setting an initial value, check that it was set instead with BUG_ON() because 0 units per allocation is bogus. [1] https://lore.kernel.org/lkml/20210515180817.1751084-1-trix@redhat.com/ Reported-by: Tom Rix <trix@redhat.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-06-14 14:42:05 +00:00
Roman Gushchin	faf65dde84	percpu: rework memcg accounting The current implementation of the memcg accounting of the percpu memory is based on the idea of having two separate sets of chunks for accounted and non-accounted memory. This approach has an advantage of not wasting any extra memory for memcg data for non-accounted chunks, however it complicates the code and leads to a higher chunks number due to a lower chunk utilization. Instead of having two chunk types it's possible to declare all* chunks memcg-aware unless the kernel memory accounting is disabled globally by a boot option. The size of objcg_array is usually small in comparison to chunks themselves (it obviously depends on the number of CPUs), so even if some chunk will have no accounted allocations, the memory waste isn't significant and will likely be compensated by a higher chunk utilization. Also, with time more and more percpu allocations will likely become accounted. * The first chunk is initialized before the memory cgroup subsystem, so we don't know for sure whether we need to allocate obj_cgroups. Because it's small, let's make it free for use. Then we don't need to allocate obj_cgroups for it. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-06-05 20:43:15 +00:00
Wei Yongjun	8d55ba5df3	percpu: make symbol 'pcpu_free_slot' static The sparse tool complains as follows: mm/percpu.c:138:5: warning: symbol 'pcpu_free_slot' was not declared. Should it be static? This symbol is not used outside of percpu.c, so marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-05-14 20:57:54 +00:00
Ingo Molnar	f0953a1bba	mm: fix typos in comments Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-07 00:26:35 -07:00
Roman Gushchin	f183324133	percpu: implement partial chunk depopulation From Roman ("percpu: partial chunk depopulation"): In our [Facebook] production experience the percpu memory allocator is sometimes struggling with returning the memory to the system. A typical example is a creation of several thousands memory cgroups (each has several chunks of the percpu data used for vmstats, vmevents, ref counters etc). Deletion and complete releasing of these cgroups doesn't always lead to a shrinkage of the percpu memory, so that sometimes there are several GB's of memory wasted. The underlying problem is the fragmentation: to release an underlying chunk all percpu allocations should be released first. The percpu allocator tends to top up chunks to improve the utilization. It means new small-ish allocations (e.g. percpu ref counters) are placed onto almost filled old-ish chunks, effectively pinning them in memory. This patchset solves this problem by implementing a partial depopulation of percpu chunks: chunks with many empty pages are being asynchronously depopulated and the pages are returned to the system. To illustrate the problem the following script can be used: -- cd /sys/fs/cgroup mkdir percpu_test echo "+memory" > percpu_test/cgroup.subtree_control cat /proc/meminfo \| grep Percpu for i in `seq 1 1000`; do mkdir percpu_test/cg_"${i}" for j in `seq 1 10`; do mkdir percpu_test/cg_"${i}"_"${j}" done done cat /proc/meminfo \| grep Percpu for i in `seq 1 1000`; do for j in `seq 1 10`; do rmdir percpu_test/cg_"${i}"_"${j}" done done sleep 10 cat /proc/meminfo \| grep Percpu for i in `seq 1 1000`; do rmdir percpu_test/cg_"${i}" done rmdir percpu_test -- It creates 11000 memory cgroups and removes every 10 out of 11. It prints the initial size of the percpu memory, the size after creating all cgroups and the size after deleting most of them. Results: vanilla: ./percpu_test.sh Percpu: 7488 kB Percpu: 481152 kB Percpu: 481152 kB with this patchset applied: ./percpu_test.sh Percpu: 7488 kB Percpu: 481408 kB Percpu: 135552 kB The total size of the percpu memory was reduced by more than 3.5 times. This patch: This patch implements partial depopulation of percpu chunks. As of now, a chunk can be depopulated only as a part of the final destruction, if there are no more outstanding allocations. However to minimize a memory waste it might be useful to depopulate a partially filed chunk, if a small number of outstanding allocations prevents the chunk from being fully reclaimed. This patch implements the following depopulation process: it scans over the chunk pages, looks for a range of empty and populated pages and performs the depopulation. To avoid races with new allocations, the chunk is previously isolated. After the depopulation the chunk is sidelined to a special list or freed. New allocations prefer using active chunks to sidelined chunks. If a sidelined chunk is used, it is reintegrated to the active lists. The depopulation is scheduled on the free path if the chunk is all of the following: 1) has more than 1/4 of total pages free and populated 2) the system has enough free percpu pages aside of this chunk 3) isn't the reserved chunk 4) isn't the first chunk If it's already depopulated but got free populated pages, it's a good target too. The chunk is moved to a special slot, pcpu_to_depopulate_slot, chunk->isolated is set, and the balance work item is scheduled. On isolation, these pages are removed from the pcpu_nr_empty_pop_pages. It is constantly replaced to the to_depopulate_slot when it meets these qualifications. pcpu_reclaim_populated() iterates over the to_depopulate_slot until it becomes empty. The depopulation is performed in the reverse direction to keep populated pages close to the beginning. Depopulated chunks are sidelined to preferentially avoid them for new allocations. When no active chunk can suffice a new allocation, sidelined chunks are first checked before creating a new chunk. Signed-off-by: Roman Gushchin <guro@fb.com> Co-developed-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Dennis Zhou <dennis@kernel.org> Tested-by: Pratik Sampat <psampat@linux.ibm.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-21 18:17:40 +00:00
Dennis Zhou	1c29a3ceaf	percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1 This prepares for adding a to_depopulate list and sidelined list after the free slot in the set of lists in pcpu_slot. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-21 18:17:40 +00:00
Roman Gushchin	8ea2e1e35d	percpu: factor out pcpu_check_block_hint() Factor out the pcpu_check_block_hint() helper, which will be useful in the future. The new function checks if the allocation can likely fit within the contig hint. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-21 18:17:35 +00:00
Roman Gushchin	67c2669d69	percpu: split __pcpu_balance_workfn() __pcpu_balance_workfn() became fairly big and hard to follow, but in fact it consists of two fully independent parts, responsible for the destruction of excessive free chunks and population of necessarily amount of free pages. In order to simplify the code and prepare for adding of a new functionality, split it in two functions: 1) pcpu_balance_free, 2) pcpu_balance_populated. Move the taking/releasing of the pcpu_alloc_mutex to an upper level to keep the current synchronization in place. Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-16 20:57:59 +00:00
Roman Gushchin	ac9380f6b8	percpu: fix a comment about the chunks ordering Since the commit `3e54097beb` ("percpu: manage chunks based on contig_bits instead of free_bytes") chunks are sorted based on the size of the biggest continuous free area instead of the total number of free bytes. Update the corresponding comment to reflect this. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-16 20:57:49 +00:00
Roman Gushchin	0760fa3d8f	percpu: make pcpu_nr_empty_pop_pages per chunk type nr_empty_pop_pages is used to guarantee that there are some free populated pages to satisfy atomic allocations. Accounted and non-accounted allocations are using separate sets of chunks, so both need to have a surplus of empty pages. This commit makes pcpu_nr_empty_pop_pages and the corresponding logic per chunk type. [Dennis] This issue came up as I was reviewing [1] and realized I missed this. Simultaneously, it was reported btrfs was seeing failed atomic allocations in fsstress tests [2] and [3]. [1] https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/ [2] https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/ [3] https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ Fixes: `3c7be18ac9` ("mm: memcg/percpu: account percpu memory to memory cgroups") Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Roman Gushchin <guro@fb.com> Tested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-04-09 13:58:38 +00:00
Dennis Zhou	258e0815e2	percpu: fix clang modpost section mismatch pcpu_build_alloc_info() is an __init function that makes a call to cpumask_clear_cpu(). With CONFIG_GCOV_PROFILE_ALL enabled, the inline heuristics are modified and such cpumask_clear_cpu() which is marked inline doesn't get inlined. Because it works on mask in __initdata, modpost throws a section mismatch error. Arnd sent a patch with the flatten attribute as an alternative [2]. I've added it to compiler_attributes.h. modpost complaint: WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask The function cpumask_clear_cpu() references the variable __initdata pcpu_build_alloc_info.mask. This is often because cpumask_clear_cpu lacks a __initdata annotation or the annotation of pcpu_build_alloc_info.mask is wrong. clang output: mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline] [1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/ [2] https://lore.kernel.org/lkml/CAK8P3a2ZWfNeXKSm8K_SUhhwkor17jFo3xApLXjzfPqX0eUDUA@mail.gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2021-02-14 18:15:15 +00:00
Wonhyuk Yang	d7d29ac76f	percpu: reduce the number of cpu distance comparisons To build group_map[] and group_cnt[], we find out which group CPUs belong to by comparing the distance of the cpu. However, this includes cases where comparisons are not required. This patch uses a bitmap to record CPUs that is not classified in the group. CPUs that we know which group they belong to should be cleared from the bitmap. In result, we can reduce the number of unnecessary comparisons. Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> [Dennis: added cpumask_clear() call and #include cpumask.h.]	2021-02-14 17:34:05 +00:00
Dennis Zhou	61cf93d3e1	percpu: convert flexible array initializers to use struct_size() Use the safer macro as sparked by the long discussion in [1]. [1] https://lore.kernel.org/lkml/20200917204514.GA2880159@google.com/ Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2020-10-30 23:02:28 +00:00
Roman Gushchin	279c3393e2	mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() Patch series "mm: kmem: kernel memory accounting in an interrupt context". This patchset implements memcg-based memory accounting of allocations made from an interrupt context. Historically, such allocations were passed unaccounted mostly because charging the memory cgroup of the current process wasn't an option. Also performance reasons were likely a reason too. The remote charging API allows to temporarily overwrite the currently active memory cgroup, so that all memory allocations are accounted towards some specified memory cgroup instead of the memory cgroup of the current process. This patchset extends the remote charging API so that it can be used from an interrupt context. Then it removes the fence that prevented the accounting of allocations made from an interrupt context. It also contains a couple of optimizations/code refactorings. This patchset doesn't directly enable accounting for any specific allocations, but prepares the code base for it. The bpf memory accounting will likely be the first user of it: a typical example is a bpf program parsing an incoming network packet, which allocates an entry in hashmap map to store some information. This patch (of 4): Currently memcg_kmem_bypass() is called before obtaining the current memory/obj cgroup using get_mem/obj_cgroup_from_current(). Moving memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the number of call sites and allows further code simplifications. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-18 09:27:09 -07:00
Sunghyun Jin	b3b33d3c43	percpu: fix first chunk size calculation for populated bitmap Variable populated, which is a member of struct pcpu_chunk, is used as a unit of size of unsigned long. However, size of populated is miscounted. So, I fix this minor part. Fixes: `8ab16c43ea` ("percpu: change the number of pages marked in the first_chunk pop bitmap") Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Sunghyun Jin <mcsmonk@gmail.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>	2020-09-17 17:34:39 +00:00
Roman Gushchin	772616b031	mm: memcg/percpu: per-memcg percpu memory statistics Percpu memory can represent a noticeable chunk of the total memory consumption, especially on big machines with many CPUs. Let's track percpu memory usage for each memcg and display it in memory.stat. A percpu allocation is usually scattered over multiple pages (and nodes), and can be significantly smaller than a page. So let's add a byte-sized counter on the memcg level: MEMCG_PERCPU_B. Byte-sized vmstat infra created for slabs can be perfectly reused for percpu case. [guro@fb.com: v3] Link: http://lkml.kernel.org/r/20200623184515.4132564-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Dennis Zhou <dennis@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Bixuan Cui <cuibixuan@huawei.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200608230819.832349-4-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:55 -07:00
Roman Gushchin	3c7be18ac9	mm: memcg/percpu: account percpu memory to memory cgroups Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. To implement the perpcu accounting it's possible to take the slab memory accounting as a model to follow. Let's introduce two types of percpu chunks: root and memcg. What makes memcg chunks different is an additional space allocated to store memcg membership information. If __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used. If it's possible to charge the corresponding size to the target memory cgroup, allocation is performed, and the memcg ownership data is recorded. System-wide allocations are performed using root chunks, so there is no additional memory overhead. To implement a fast reparenting of percpu memory on memcg removal, we don't store mem_cgroup pointers directly: instead we use obj_cgroup API, introduced for slab accounting. [akpm@linux-foundation.org: fix CONFIG_MEMCG_KMEM=n build errors and warning] [akpm@linux-foundation.org: move unreachable code, per Roman] [cuibixuan@huawei.com: mm/percpu: fix 'defined but not used' warning] Link: http://lkml.kernel.org/r/6d41b939-a741-b521-a7a2-e7296ec16219@huawei.com Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Bixuan Cui <cuibixuan@huawei.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200623184515.4132564-3-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:55 -07:00
Roman Gushchin	5b32af91b5	percpu: return number of released bytes from pcpu_free_area() Patch series "mm: memcg accounting of percpu memory", v3. This patchset adds percpu memory accounting to memory cgroups. It's based on the rework of the slab controller and reuses concepts and features introduced for the per-object slab accounting. Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. Percpu allocations by their nature are scattered over multiple pages, so they can't be tracked on the per-page basis. So the per-object tracking introduced by the new slab controller is reused. The patchset implements charging of percpu allocations, adds memcg-level statistics, enables accounting for percpu allocations made by memory cgroup internals and provides some basic tests. To implement the accounting of percpu memory without a significant memory and performance overhead the following approach is used: all accounted allocations are placed into a separate percpu chunk (or chunks). These chunks are similar to default chunks, except that they do have an attached vector of pointers to obj_cgroup objects, which is big enough to save a pointer for each allocated object. On the allocation, if the allocation has to be accounted (__GFP_ACCOUNT is passed, the allocating process belongs to a non-root memory cgroup, etc), the memory cgroup is getting charged and if the maximum limit is not exceeded the allocation is performed using a memcg-aware chunk. Otherwise -ENOMEM is returned or the allocation is forced over the limit, depending on gfp (as any other kernel memory allocation). The memory cgroup information is saved in the obj_cgroup vector at the corresponding offset. On the release time the memcg information is restored from the vector and the cgroup is getting uncharged. Unaccounted allocations (at this point the absolute majority of all percpu allocations) are performed in the old way, so no additional overhead is expected. To avoid pinning dying memory cgroups by outstanding allocations, obj_cgroup API is used instead of directly saving memory cgroup pointers. obj_cgroup is basically a pointer to a memory cgroup with a standalone reference counter. The trick is that it can be atomically swapped to point at the parent cgroup, so that the original memory cgroup can be released prior to all objects, which has been charged to it. Because all charges and statistics are fully recursive, it's perfectly correct to uncharge the parent cgroup instead. This scheme is used in the slab memory accounting, and percpu memory can just follow the scheme. This patch (of 5): To implement accounting of percpu memory we need the information about the size of freed object. Return it from pcpu_free_area(). Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> cC: Michal Koutnýutny@suse.com> Cc: Bixuan Cui <cuibixuan@huawei.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200623184515.4132564-1-guro@fb.com Link: http://lkml.kernel.org/r/20200608230819.832349-1-guro@fb.com Link: http://lkml.kernel.org/r/20200608230819.832349-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:55 -07:00
Kees Cook	3f649ab728	treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>	2020-07-16 12:35:15 -07:00

1 2 3 4 5 ...

290 Commits