Centos-kernel-stream-9

Commit Graph

Author	SHA1	Message	Date
Michael Petlan	03ff35bf05	perf record: Fix cpu mask bit setting for mixed mmaps Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit ca76d7d2812b46124291f99c9b50aaf63a936f23 Author: Adrian Hunter <adrian.hunter@intel.com> Date: Thu Sep 15 15:26:11 2022 +0300 description =========== With mixed per-thread and (system-wide) per-cpu maps, the "any cpu" value -1 must be skipped when setting CPU mask bits. Prior to commit cbd7bfc7fd99acdd ("tools/perf: Fix out of bound access to cpu mask array") the invalid setting went unnoticed, but since then it causes perf record to fail with an error. Example: Before: $ perf record -e intel_pt// --per-thread uname Failed to initialize parallel data streaming masks After: $ perf record -e intel_pt// --per-thread uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.068 MB perf.data ] Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220915122612.81738-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:26:06 +01:00
Michael Petlan	946d524ed7	perf record: Fix synthesis failure warnings Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit faf59ec8c3c3708c64ff76b50e6f757c6b4a1054 Author: Adrian Hunter <adrian.hunter@intel.com> Date: Wed Sep 7 19:24:58 2022 +0300 description =========== Some calls to synthesis functions set err < 0 but only warn about the failure and continue. However they do not set err back to zero, relying on subsequent code to do that. That changed with the introduction of option --synth. When --synth=no subsequent functions that set err back to zero are not called. Fix by setting err = 0 in those cases. Example: Before: $ perf record --no-bpf-event --synth=all -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. After: $ perf record --no-bpf-event --synth=no -o /tmp/huh uname Couldn't synthesize bpf events. Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ] Fixes: 41b740b6e8a994e5 ("perf record: Add --synth option") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220907162458.72817-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:26:05 +01:00
Michael Petlan	f0e330e6b6	tools/perf: Fix out of bound access to cpu mask array Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit cbd7bfc7fd99acdde58ec2b0bce990158fba1654 Author: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Date: Mon Sep 5 19:49:29 2022 +0530 description =========== The cpu mask init code in "record__mmap_cpu_mask_init" function access "bits" array part of "struct mmap_cpu_mask". The size of this array is the value from cpu__max_cpu().cpu. This array is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in "bits" array. If we provide a command line option to -C which is greater than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Debugging with gdb, points to function flow as below: <<>> set_bit record__mmap_cpu_mask_init record__init_thread_default_masks record__init_thread_masks cmd_record <<>> Fix this by adding boundary check for the array. After the patch: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Failed to initialize parallel data streaming masks <<>> With this fix, if -C is given a non-exsiting CPU, perf record will fail with: <<>> ./perf record -C 50 ls Failed to initialize parallel data streaming masks <<>> Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220905141929.7171-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:26:05 +01:00
Michael Petlan	563c7868cf	perf record: Improve error message of -p not_existing_pid Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit 1bf7d836e57ba4943e33e163e730cd77ab837572 Author: Martin Liška <mliska@suse.cz> Date: Fri Aug 12 13:40:49 2022 +0200 description =========== When one uses -p $not_existing_pid, the output of --help is printed: $ perf record -p 123456789 2>&1 \| head -n3 Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] Let's change it something similar what perf top -p $not_existing_pid prints: $ ./perf top -p 123456789 --stdio Error: Couldn't create thread/CPU maps: No such process Newly suggested error message: $ ./perf record -p 123456789 Couldn't create thread/CPU maps: No such process Signed-off-by: Martin Liška <mliska@suse.cz> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/8e00eda1-4de0-2c44-ce67-d4df48ac1f7c@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:26:00 +01:00
Michael Petlan	d7493e1c75	perf record: Add finished init event Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit 3812d2987733c5a00e103be4e23d63ec9342043a Author: Adrian Hunter <adrian.hunter@intel.com> Date: Fri Jun 10 14:33:15 2022 +0300 description =========== In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. This is needed to enable injecting events after the initial synthesized user events (that have an all zero id sample) but before regular events. Committer notes: Add entry about PERF_RECORD_FINISHED_INIT to tools/perf/Documentation/perf.data-file-format.txt. Committer testing: Before: # perf report -D \| grep FINISHED 0 0x5910 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) # After: # perf record -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf report -D \| grep FINISHED 0 0x5068 [0x8]: PERF_RECORD_FINISHED_INIT: unhandled! 0 0x5390 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) FINISHED_INIT events: 1 ( 0.5%) # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:25:42 +01:00
Michael Petlan	105c09c919	perf record: Add new option to sample identifier Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit 61110883a02090cb5fd1f890978e238cc99f0164 Author: Adrian Hunter <adrian.hunter@intel.com> Date: Wed Jun 15 08:25:11 2022 +0300 description =========== In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Add an option to always include sample type PERF_SAMPLE_IDENTIFIER. Committer testing: # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # # perf record --sample-identifier sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD\|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220615052511.4441-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:25:42 +01:00
Michael Petlan	bfde98a192	perf record: Always record id index Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit 6b080312fc821658a479e74bdb0c3f7d9ac5838f Author: Adrian Hunter <adrian.hunter@intel.com> Date: Fri Jun 10 14:33:13 2022 +0300 description =========== In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Adjust the logic so that if there are IDs then the id index is recorded. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:25:42 +01:00
Michael Petlan	7d029575f4	perf record: Always get text_poke events with --kcore option Bugzilla: https://bugzilla.redhat.com/2123229 upstream ======== commit f42c0ce573df79d1b8bd169008c994dcdd43585a Author: Adrian Hunter <adrian.hunter@intel.com> Date: Fri Jun 10 14:33:12 2022 +0300 description =========== kcore provides a copy of the running kernel including any modified code. A trace that benefits from that also benefits from text_poke events, so enable them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220610113316.6682-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-11-14 20:25:42 +01:00
Michael Petlan	b5a8dd0e08	perf record: Add cgroup support for off-cpu profiling Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 685439a7a037d8677e3d1acf0302624002ee6a6d Author: Namhyung Kim <namhyung@kernel.org> Date: Wed May 18 15:47:24 2022 -0700 description =========== This covers two different use cases. The first one is cgroup filtering given by -G/--cgroup option which controls the off-cpu profiling for tasks in the given cgroups only. The other use case is cgroup sampling which is enabled by --all-cgroups option and it adds PERF_SAMPLE_CGROUP to the sample_type to set the cgroup id of the task in the sample data. Example output. $ sudo perf record -a --off-cpu --all-cgroups sleep 1 $ sudo perf report --stdio -s comm,cgroup --call-graph=no ... # Samples: 144 of event 'offcpu-time' # Event count (approx.): 48452045427 # # Children Self Command Cgroup # ........ ........ ............... .......................................... # 61.57% 5.60% Chrome_ChildIOT /user.slice/user-657345.slice/user@657345.service/app.slice/... 29.51% 7.38% Web Content /user.slice/user-657345.slice/user@657345.service/app.slice/... 17.48% 1.59% Chrome_IOThread /user.slice/user-657345.slice/user@657345.service/app.slice/... 16.48% 4.12% pipewire-pulse /user.slice/user-657345.slice/user@657345.service/session.slice/... 14.48% 2.07% perf /user.slice/user-657345.slice/user@657345.service/app.slice/... 14.30% 7.15% CompositorTileW /user.slice/user-657345.slice/user@657345.service/app.slice/... 13.33% 6.67% Timer /user.slice/user-657345.slice/user@657345.service/app.slice/... ... Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:10 +02:00
Michael Petlan	c52b642036	perf record: Implement basic filtering for off-cpu Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 10742d0c0771d9fb0329d03bb7c7620c8738f065 Author: Namhyung Kim <namhyung@kernel.org> Date: Wed May 18 15:47:22 2022 -0700 description =========== It should honor cpu and task filtering with -a, -C or -p, -t options. Committer testing: # perf record --off-cpu --cpu 1 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.722 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.446 MB perf.data (7248 samples) ] # # perf script \| head -20 perf 97164 [001] 38287.696761: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696764: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696765: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696767: 212 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696768: 5130 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696770: 123063 cycles: ffffffffb6e0011e syscall_return_via_sysret+0x38 (vmlinux) perf 97164 [001] 38287.696803: 2292748 cycles: ffffffffb636c82d __fput+0xad (vmlinux) swapper 0 [001] 38287.702852: 1927474 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97513 97513 [001] 38287.767207: 1172536 cycles: ffffffffb612ff65 newidle_balance+0x5 (vmlinux) swapper 0 [001] 38287.769567: 1073081 cycles: ffffffffb618216d ktime_get_mono_fast_ns+0xd (vmlinux) :97533 97533 [001] 38287.770962: 984460 cycles: ffffffffb65b2900 selinux_socket_sendmsg+0x0 (vmlinux) :97540 97540 [001] 38287.772242: 883462 cycles: ffffffffb6d0bf59 irqentry_exit_to_user_mode+0x9 (vmlinux) swapper 0 [001] 38287.773633: 741963 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97552 97552 [001] 38287.774539: 606680 cycles: ffffffffb62eda0a page_add_file_rmap+0x7a (vmlinux) :97556 97556 [001] 38287.775333: 502254 cycles: ffffffffb634f964 get_obj_cgroup_from_current+0xc4 (vmlinux) :97561 97561 [001] 38287.776163: 427891 cycles: ffffffffb61b1522 cgroup_rstat_updated+0x22 (vmlinux) swapper 0 [001] 38287.776854: 359030 cycles: ffffffffb612fc5e load_balance+0x9ce (vmlinux) :97567 97567 [001] 38287.777312: 330371 cycles: ffffffffb6a8d8d0 skb_set_owner_w+0x0 (vmlinux) :97566 97566 [001] 38287.777589: 311622 cycles: ffffffffb614a7a8 native_queued_spin_lock_slowpath+0x148 (vmlinux) :97512 97512 [001] 38287.777671: 307851 cycles: ffffffffb62e0f35 find_vma+0x55 (vmlinux) # # perf record --off-cpu --cpu 4 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.613 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.415 MB perf.data (6729 samples) ] # perf script \| head -20 perf 97650 [004] 38323.728036: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728040: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728041: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728042: 208 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728044: 5026 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728046: 119970 cycles: ffffffffb6d0bebc syscall_exit_to_user_mode+0x1c (vmlinux) perf 97650 [004] 38323.728078: 2190103 cycles: 54b756 perf_tool__process_synth_event+0x16 (/home/acme/bin/perf) swapper 0 [004] 38323.783357: `1593139` cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.785352: `1593139` cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.797330: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.802350: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.806333: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97996 97996 [004] 38323.807145: 1418936 cycles: 7f5db9be6917 [unknown] ([unknown]) :97959 97959 [004] 38323.807730: 1445074 cycles: ffffffffb6329d36 memcg_slab_post_alloc_hook+0x146 (vmlinux) :97959 97959 [004] 38323.808103: 1341584 cycles: ffffffffb62fd90f get_page_from_freelist+0x112f (vmlinux) :97959 97959 [004] 38323.808451: 1227537 cycles: ffffffffb65b2905 selinux_socket_sendmsg+0x5 (vmlinux) :97959 97959 [004] 38323.808768: 1184321 cycles: ffffffffb6d1ba35 _raw_spin_lock_irqsave+0x15 (vmlinux) :97959 97959 [004] 38323.809073: 1153017 cycles: ffffffffb6a8d92d skb_set_owner_w+0x5d (vmlinux) :97959 97959 [004] 38323.809402: 1126875 cycles: ffffffffb6329c64 memcg_slab_post_alloc_hook+0x74 (vmlinux) :97959 97959 [004] 38323.809695: 1073248 cycles: ffffffffb6e0001d entry_SYSCALL_64+0x1d (vmlinux) # Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:10 +02:00
Michael Petlan	b22b83620d	perf record: Enable off-cpu analysis with BPF Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit edc41a1099c2d08ccfd4ed7d59688501e3749015 Author: Namhyung Kim <namhyung@kernel.org> Date: Wed May 18 15:47:21 2022 -0700 description =========== Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:10 +02:00
Michael Petlan	88fa1aca8c	perf tools: Allow all_cpus to be a superset of user_requested_cpus Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 7be1fedd2a0a5b8f20952a675c611815254b74b6 Author: Adrian Hunter <adrian.hunter@intel.com> Date: Tue May 24 10:54:30 2022 +0300 description =========== To support collection of system-wide events with user requested CPUs, all_cpus must be a superset of user_requested_cpus. In order to support all_cpus to be a superset of user_requested_cpus, all_cpus must be used instead of user_requested_cpus when dealing with CPUs of all events instead of CPUs of requested events. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:09 +02:00
Michael Petlan	a90760a767	perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 921e3be5a5648f483f80c9ba21ca2942d82d581c Author: Adrian Hunter <adrian.hunter@intel.com> Date: Tue May 24 10:54:27 2022 +0300 description =========== Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() in preparation for allowing system-wide events on all CPUs while the user requested events are on only user requested CPUs. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:09 +02:00
Michael Petlan	1f17466f8e	perf cpumap: Switch to using perf_cpu_map API Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 0255571a16059c8e863a65a4b1611db93bb9b3ae Author: Ian Rogers <irogers@google.com> Date: Mon May 2 21:17:52 2022 -0700 description =========== Switch some raw accesses to the cpu map to using the library API. This can help with reference count checking. Some BPF cases switch from index to CPU for consistency, this shouldn't matter as the CPU map is full. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:23:03 +02:00
Michael Petlan	f63d52f364	perf record: Fix per-thread option Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 23380e4d53305765789fd1a2cf6bddb07239cd3b Author: Alexey Bayduraev <alexey.bayduraev@gmail.com> Date: Wed Apr 13 18:46:40 2022 -0700 description =========== Per-thread mode doesn't have specific CPUs for events, add checks for this case. Minor fix to a pr_debug by Ian Rogers <irogers@google.com> to avoid an out of bound array access. Fixes: 7954f71689f90cb2 ("perf record: Introduce thread affinity and mmap masks") Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:57 +02:00
Michael Petlan	1ed24eddc8	perf evlist: Rename cpus to user_requested_cpus Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 0df6ade7119daa40904b0c18871169e753663e14 Author: Ian Rogers <irogers@google.com> Date: Mon Mar 28 16:26:44 2022 -0700 description =========== evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps of all evsels. For non-task targets, cpus is set to be cpus requested from the command line, defaulting to all online cpus if no cpus are specified. For an uncore event, all_cpus may be just CPU 0 or every online CPU. This causes all_cpus to have fewer values than the cpus variable which is confusing given the 'all' in the name. To try to make the behavior clearer, rename cpus to user_requested_cpus and add comments on the two struct variables. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:55 +02:00
Michael Petlan	46ba8a7118	perf data: Adding error message if perf_data__create_dir() fails Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 65e7c963267f128df155f496a50933cea7dfa5b8 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Tue Feb 22 12:14:17 2022 +0300 description =========== Add proper return codes for all cases of data directory creation failure and add error message output based on these codes. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:52 +02:00
Michael Petlan	b6823ef9b1	perf record: Implement compatibility checks Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit b5f2511d4b3976e352b47b79c3c119addb7c2033 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:34 2022 +0300 description =========== Implement compatibility checks for other modes and related command line options: asynchronous (--aio) trace streaming and affinity (--affinity) modes, pipe mode, AUX area tracing --snapshot and --aux-sample options, --switch-output, --switch-output-event, --switch-max-files and --timestamp-filename options. Parallel data streaming is compatible with Zstd compression (--compression-level) and external control commands (--control). CPU mask provided via -C option filters --threads specification masks. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	5a53f236d5	perf record: Extend --threads command line option Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit f466e5ed6c356d1dc22dda68f46315a92ec160c6 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:33 2022 +0300 description =========== Extend --threads option in perf record command line interface. The option can have a value in the form of masks that specify CPUs to be monitored with data streaming threads and its layout in system topology. The masks can be filtered using CPU mask provided via -C option. The specification value can be user defined list of masks. Masks separated by colon define CPUs to be monitored by one thread and affinity mask of that thread is separated by slash. For example: <cpus mask 1>/<affinity mask 1>:<cpu mask 2>/<affinity mask 2> specifies parallel threads layout that consists of two threads with corresponding assigned CPUs to be monitored. The specification value can be a string e.g. "cpu", "core" or "package" meaning creation of data streaming thread for every CPU or core or package to monitor distinct CPUs or CPUs grouped by core or package. The option provided with no or empty value defaults to per-cpu parallel threads layout creating data streaming thread for every CPU being monitored. Document --threads option syntax and parallel data streaming modes in Documentation/perf-record.txt. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	d12e83820a	perf record: Introduce --threads command line option Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 06380a849fa89da33d309597890ef26d24095b41 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:32 2022 +0300 description =========== Provide --threads option in perf record command line interface. The option creates a data streaming thread for each CPU in the system. Document --threads option in Documentation/perf-record.txt. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	59f5923ba3	perf record: Introduce data transferred and compressed stats Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 610fbc016531b7a09dcc98febd2a8f4a0cdd3190 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:31 2022 +0300 description =========== Introduce bytes_transferred and bytes_compressed stats so they would capture statistics for the related data buffer transfers. [ Use PRiu64 to print u64 values, fixing the build on 32-bit architectures ] Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	e7a354e033	perf record: Introduce compressor at mmap buffer object Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 75f5f1fcb9c0f0f542f44d993de18047b2b7f37f Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:30 2022 +0300 description =========== Introduce compressor object into mmap object so it could be used to pack the data stream from the corresponding kernel data buffer. Initialize and make use of the introduced per mmap compressor. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	9cef0e1b26	perf record: Introduce bytes written stats Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit ae9c7242b29fa2976c70b5b250f8942cf7289211 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:29 2022 +0300 description =========== Introduce a function to calculate the total amount of data written and use it to support the --max-size option. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	bc0a90e384	perf record: Introduce data file at mmap buffer object Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 56f735fff35e31e54027df36a653b0268bc94f06 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:28 2022 +0300 description =========== Introduce data file objects into mmap object so it could be used to process and store data stream from the corresponding kernel data buffer. Initialize data files located at mmap buffer objects so trace data can be written into several data file located at data directory. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	eb10d57999	perf record: Start threads in the beginning of trace streaming Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 3217e9fecf118d5dcabdd68d91e0c6afcb4c3e1b Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:27 2022 +0300 description =========== Start thread in detached state because its management is implemented via messaging to avoid any scaling issues. Block signals prior thread start so only main tool thread would be notified on external async signals during data collection. Thread affinity mask is used to assign eligible CPUs for the thread to run. Wait and sync on thread start using thread ack pipe. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:46 +02:00
Michael Petlan	c2f92debce	perf record: Stop threads in the end of trace streaming Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 1e5de7d9c6ded0722736eb6e58c72b18937efc06 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:26 2022 +0300 description =========== Signal thread to terminate by closing write fd of msg pipe. Receive THREAD_MSG__READY message as the confirmation of the thread's termination. Stop threads created for parallel trace streaming prior their stats processing. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:45 +02:00
Michael Petlan	3e7fafcf71	perf record: Introduce thread local variable Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 396b626b95d22664d2f2e5ca332e777ea699a10e Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:25 2022 +0300 description =========== Introduce thread local variable and use it for threaded trace streaming. Use thread affinity mask instead of record affinity mask in affinity modes. Use evlist__ctlfd_update() to propagate control commands from thread object to global evlist object to enable evlist__ctlfd_* functionality. Move waking and sample statistic to struct record_thread and introduce record__waking function to calculate the total number of wakes. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:45 +02:00
Michael Petlan	538a3bf73d	perf record: Introduce thread specific data array Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 415ccb58f68a6bebcbb9db373973394a6af3d553 Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:23 2022 +0300 description =========== Introduce thread specific data object and array of such objects to store and manage thread local data. Implement functions to allocate, initialize, finalize and release thread specific data. Thread local maps and overwrite_maps arrays keep pointers to mmap buffer objects to serve according to maps thread mask. Thread local pollfd array keeps event fds connected to mmaps buffers according to maps thread mask. Thread control commands are delivered via thread local comm pipes and ctlfd_pos fd. External control commands (--control option) are delivered via evlist ctlfd_pos fd and handled by the main tool thread. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:45 +02:00
Michael Petlan	4b283bbb14	perf record: Introduce thread affinity and mmap masks Bugzilla: https://bugzilla.redhat.com/2123231 upstream ======== commit 7954f71689f90cb2ae252d3923354d48071994bf Author: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Date: Mon Jan 17 21:34:21 2022 +0300 description =========== Introduce affinity and mmap thread masks. Thread affinity mask defines CPUs that a thread is allowed to run on. Thread maps mask defines mmap data buffers the thread serves to stream profiling data from. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-09-21 07:22:45 +02:00
Michael Petlan	5f091c527c	perf record: Disable debuginfod by default Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 9bce13ea88f85344b765abe5d3dabdd0f44dc177 Author: Jiri Olsa <jolsa@redhat.com> Date: Thu Dec 9 21:04:25 2021 +0100 description =========== Fedora 35 sets DEBUGINFOD_URLS by default, which might lead to unexpected stalls in perf record exit path, when we try to cache profiled binaries. # DEBUGINFOD_PROGRESS=1 ./perf record -a ^C[ perf record: Woken up 1 times to write data ] Downloading from https://debuginfod.fedoraproject.org/ 447069 Downloading from https://debuginfod.fedoraproject.org/ 1502175 Downloading \^Z Disabling DEBUGINFOD_URLS by default in perf record and adding debuginfod option and .perfconfig variable support to enable id. Default without debuginfo processing: # perf record -a Using system debuginfod setup: # perf record -a --debuginfod Using custom debuginfd url: # perf record -a --debuginfod='https://evenbetterdebuginfodserver.krava' Adding single perf_debuginfod_setup function and using it also in perf buildid-cache command. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:36:04 +02:00
Michael Petlan	86ce88496c	perf cpumap: Give CPUs their own type Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 6d18804b963b78dcd53851f11e9080408b3d85c2 Author: Ian Rogers <irogers@google.com> Date: Tue Jan 4 22:13:51 2022 -0800 description =========== A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:36:01 +02:00
Michael Petlan	49007a9814	perf tools: Record ARM64 LR register automatically Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 7248e308a57587615431b83689cd57e957815bfc Author: Alexandre Truong <alexandre.truong@arm.com> Date: Fri Dec 17 15:45:15 2021 +0000 description =========== On ARM64, automatically record the link register if the frame pointer mode is on. It will be used to do a dwarf unwind to find the caller of the leaf frame if the frame pointer was omitted. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:35:47 +02:00
Michael Petlan	de451198af	perf tools: Check vmlinux/kallsyms arguments in all tools Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 7cc72553ac03ec20afe2dec91dce4624ccd379b8 Author: James Clark <james.clark@arm.com> Date: Mon Oct 18 14:48:42 2021 +0100 description =========== Only perf report checked the validity of these arguments so apply the same check to all tools that read them for consistency. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:35:11 +02:00
Michael Petlan	d2b1c92483	perf tools: Add support for PERF_RECORD_AUX_OUTPUT_HW_ID Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 61750473589b6f8adc35007c8261986043907f13 Author: Adrian Hunter <adrian.hunter@intel.com> Date: Tue Sep 7 19:39:02 2021 +0300 description =========== The PERF_RECORD_AUX_OUTPUT_HW_ID event provides a way to match AUX output data like Intel PT PEBS-via-PT back to the event that it came from, by providing a hardware ID that is present in the AUX output. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:35:02 +02:00
Michael Petlan	bf53798f1c	perf record: Add --synth option Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 41b740b6e8a994e5830daa5e15785522874f7456 Author: Namhyung Kim <namhyung@kernel.org> Date: Tue Aug 10 21:46:58 2021 -0700 description =========== Add an option to control the synthesizing behavior. --synth <no\|all\|task\|mmap\|cgroup> Fine-tune event synthesis: default=all This can be useful when we know it doesn't need some synthesis like in a specific usecase and/or when using pipe: $ perf record -a --all-cgroups --synth cgroup -o- sleep 1 \| \ > perf report -i- -s cgroup Committer notes: Added a clarification to the man page entry for --synth that this is about pre-existing threads. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:34:44 +02:00
Michael Petlan	3de72356e1	perf tools: Allow controlling synthesizing PERF_RECORD_ metadata events during record Bugzilla: https://bugzilla.redhat.com/2069073 upstream ======== commit 84111b9c950ec9a8b31166973e79aa77ddcee7e3 Author: Namhyung Kim <namhyung@kernel.org> Date: Tue Aug 10 21:46:57 2021 -0700 description =========== Depending on the use case, it might require some kind of synthesizing and some not. Make it controllable to turn off heavy operations like MMAP for all tasks. Currently all users are converted to enable all the synthesis by default. It'll be updated in the later patch. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-05-16 11:34:44 +02:00
Michael Petlan	f183b25b86	perf record: Fix wrong comm in system-wide mode with delay Bugzilla: https://bugzilla.redhat.com/2069070 upstream ======== commit bb07d62e039b592f8006c9faedab48cd627e20c4 Author: Namhyung Kim <namhyung@kernel.org> Date: Fri Aug 27 16:32:12 2021 -0700 description =========== Stephane found that the name of the forked process in a system-wide mode is wrong when --delay option is used. For example, # perf record -a --delay=1000 noploop 3 The noploop process will run a busy loop for 3 second. And on an idle machine it should show up at the top in the perf report. It works well without the --delay option. But if I add the option, it showed 'perf' not 'noploop'. # perf report -s comm -q \| head -3 52.94% perf 16.65% swapper 12.04% chrome It turned out that the dummy event didn't work at all and it missed COMM and MMAP events for the noploop process (and others too). We should enable the dummy event immediately in system-wide mode, as the enable-on-exec would work only for task events. With this change, # perf report -s comm -q \| head -3 52.75% noploop 17.03% swapper 12.83% chrome Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-04-25 12:33:07 +02:00
Michael Petlan	1b539daf08	perf tools: Enable on a list of CPUs for hybrid Bugzilla: https://bugzilla.redhat.com/2069070 upstream ======== commit 1d3351e631fc34d73b530a67263188062fe598ba Author: Jin Yao <yao.jin@linux.intel.com> Date: Fri Jul 23 14:34:33 2021 +0800 description =========== The 'perf record' and 'perf stat' commands have supported the option '-C/--cpus' to count or collect only on the list of CPUs provided. This option needs to be supported for hybrid as well. For hybrid support, it needs to check that the cpu list are available on hybrid PMU. One example for AlderLake, cpu0-7 is 'cpu_core', cpu8-11 is 'cpu_atom'. Before: # perf stat -e cpu_core/cycles/ -C11 -- sleep 1 Performance counter stats for 'CPU(s) 11': <not supported> cpu_core/cycles/ 1.006179431 seconds time elapsed The 'perf stat' command silently returned "<not supported>" without any helpful information. It should error out pointing out that that cpu11 was not 'cpu_core'. After: # perf stat -e cpu_core/cycles/ -C11 -- sleep 1 WARNING: 11 isn't a 'cpu_core', please use a CPU list in the 'cpu_core' range (0-7) failed to use cpu list 11 We also need to support the events without pmu prefix specified. # perf stat -e cycles -C11 -- sleep 1 WARNING: 11 isn't a 'cpu_core', please use a CPU list in the 'cpu_core' range (0-7) Performance counter stats for 'CPU(s) 11': 1,067,373 cpu_atom/cycles/ 1.005544738 seconds time elapsed The perf tool creates two cycles events automatically, cpu_core/cycles/ and cpu_atom/cycles/. It checks that cpu11 is not 'cpu_core', then shows a warning for cpu_core/cycles/ and only count the cpu_atom/cycles/. If part of cpus are 'cpu_core' and part of cpus are 'cpu_atom', for example, # perf stat -e cycles -C0,11 -- sleep 1 WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list. WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list. Performance counter stats for 'CPU(s) 0,11': 1,914,704 cpu_core/cycles/ 2,036,983 cpu_atom/cycles/ 1.005815641 seconds time elapsed It now automatically selects cpu0 for cpu_core/cycles/, selects cpu11 for cpu_atom/cycles/, and output with some warnings. Some more complex examples, # perf stat -e cycles,instructions -C0,11 -- sleep 1 WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list. WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list. WARNING: use 0 in 'cpu_core' for 'instructions', skip other cpus in list. WARNING: use 11 in 'cpu_atom' for 'instructions', skip other cpus in list. Performance counter stats for 'CPU(s) 0,11': 2,780,387 cpu_core/cycles/ 1,583,432 cpu_atom/cycles/ 3,957,277 cpu_core/instructions/ 1,167,089 cpu_atom/instructions/ 1.006005124 seconds time elapsed # perf stat -e cycles,cpu_atom/instructions/ -C0,11 -- sleep 1 WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list. WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list. WARNING: use 11 in 'cpu_atom' for 'cpu_atom/instructions/', skip other cpus in list. Performance counter stats for 'CPU(s) 0,11': 3,290,301 cpu_core/cycles/ 1,953,073 cpu_atom/cycles/ 1,407,869 cpu_atom/instructions/ 1.006260912 seconds time elapsed Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-04-25 12:33:06 +02:00
Michael Petlan	f8ed20f6c5	perf inject: Fix output from a file to a pipe Bugzilla: https://bugzilla.redhat.com/2069070 upstream ======== commit c3a057dc3aa9979ce6dc350e05eb2e4c021432cd Author: Namhyung Kim <namhyung@kernel.org> Date: Mon Jul 19 15:31:52 2021 -0700 description =========== When the input is a regular file but the output is a pipe, it should write a pipe header. But just repiping would write a portion of the existing header which is different in 'size' value. So we need to prevent it and write a new pipe header along with other information like event attributes and features. This can handle something like this: # perf record -a -B sleep 1 # perf inject -b -i perf.data \| perf report -i - Factor out perf_event__synthesize_for_pipe() to be shared between perf record and inject. Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-04-25 12:33:02 +02:00
Michael Petlan	1df7d85569	perf tools: Remove repipe argument from perf_session__new() Bugzilla: https://bugzilla.redhat.com/2069070 upstream ======== commit 2681bd85a4b92788e265934d0d76bd56b5b08d16 Author: Namhyung Kim <namhyung@kernel.org> Date: Mon Jul 19 15:31:49 2021 -0700 description =========== The repipe argument is only used by perf inject and the all others passes 'false'. Let's remove it from the function signature and add __perf_session__new() to be called from perf inject directly. This is a preparation of the change the pipe input/output. [ Fixed up some trivial conflicts as this patchset fell thru the cracks ;-( ] Signed-off-by: Michael Petlan <mpetlan@redhat.com>	2022-04-25 12:33:02 +02:00
Vitaly Kuznetsov	f43071af87	tools: rename bitmap_alloc() to bitmap_zalloc() Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2009338 commit 7fc5b571325f1bcbe1ce384409b2d05546431b04 Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Date: Tue Sep 7 19:59:35 2021 -0700 tools: rename bitmap_alloc() to bitmap_zalloc() Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API in the kernel. No functional changes intended. Link: https://lkml.kernel.org/r/20210814211713.180533-14-yury.norov@gmail.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Suggested-by: Yury Norov <yury.norov@gmail.com> Acked-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>	2021-12-08 10:43:17 +01:00
Kan Liang	b91e5492f9	perf record: Add a dummy event on hybrid systems to collect metadata records Some symbols may not be resolved if a user only monitors one type of PMU. $ sudo perf record -e cpu_atom/branch-instructions/ ./big_small_workload $ sudo perf report –stdio # Overhead Command Shared Object Symbol # ........ ......... ................. ..................... # 28.02% perf-exec [unknown] [.] 0x0000000000401cf6 11.32% perf-exec [unknown] [.] 0x0000000000401d04 10.90% perf-exec [unknown] [.] 0x0000000000401d11 10.61% perf-exec [unknown] [.] 0x0000000000401cfc To parse symbols the metadata records, e.g., PERF_RECORD_COMM, which are generated by the kernel, are required. To decide whether to generate the metadata records, the kernel relies on the event_filter_match() to filter the unrelated events. On a hybrid system, event_filter_match() further checks the CPU mask of the current enabled PMU. If an event is collected on the CPU which doesn't have an enabled PMU, it's treated as an unrelated event. The "big_small_workload" is created in a big core, but runs on a small core. The metadata records are filtered, because the user only monitors the PMU of the small core. The big core PMU is not enabled. For a hybrid system, a dummy event is required to generate the complete side-band events. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/1625760212-18441-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-07-09 14:04:32 -03:00
Jiri Olsa	3a683120d8	libperf: Move 'nr_groups' from tools/perf to evlist::nr_groups Move evsel::nr_groups to perf_evsel::nr_groups, so we can move the group interface to libperf. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210706151704.73662-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-07-09 14:04:32 -03:00
Jiri Olsa	fba7c86601	libperf: Move 'leader' from tools/perf to perf_evsel::leader Move evsel::leader to perf_evsel::leader, so we can move the group interface to libperf. Also add several evsel helpers to ease up the transition: struct evsel evsel__leader(struct evsel evsel); - get leader evsel bool evsel__has_leader(struct evsel evsel, struct evsel leader); - true if evsel has leader as leader bool evsel__is_leader(struct evsel evsel); - true if evsel is itw own leader void evsel__set_leader(struct evsel evsel, struct evsel *leader); - set leader for evsel Committer notes: Fix this when building with 'make BUILD_BPF_SKEL=1' tools/perf/util/bpf_counter.c - if (evsel->leader->core.nr_members > 1) { + if (evsel->core.leader->nr_members > 1) { Signed-off-by: Jiri Olsa <jolsa@kernel.org> Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210706151704.73662-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-07-09 14:04:31 -03:00
Arnaldo Carvalho de Melo	ce09673636	Merge remote-tracking branch 'torvalds/master' into perf/core To pick up fixes, since perf/urgent is already upstream. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-06-22 13:56:50 -03:00
Namhyung Kim	4f2abe9192	perf record: Move probing cgroup sampling support I found that checking cgroup sampling support using the missing features doesn't work on old kernels. Because it added both attr.cgroup bit and PERF_SAMPLE_CGROUP bit, it needs to check whichever comes first (usually the actual event, not dummy). But it only checks the attr.cgroup bit which is set only in the dummy event so cannot detect failtures due the sample bits. Also we don't ignore the missing feature and retry, it'd be better checking it with the API probing logic. Committer notes: Extracted the minimal part to check using the new cgroup API probe routine, the part that removes the cgroup member can be left for further discussion. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210527182835.1634339-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-06-01 10:32:00 -03:00
Adrian Hunter	66286ed3e8	perf record: Set timestamp boundary for AUX area events AUX area data is not processed by 'perf record' and consequently the --timestamp-boundary option may result in no values for "time of first sample" and "time of last sample". However there are non-sample events that can be used instead, namely 'itrace_start' and 'aux'. 'itrace_start' is issued before tracing starts, and 'aux' is issued every time data is ready. Implement tool callbacks for those two for 'perf record', to update the timestamp boundary. Example: $ perf record -e intel_pt//u --timestamp-boundary uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data ] $ perf script --header-only \| grep "time of" # time of first sample : 4574.835541 # time of last sample : 4574.835907 $ perf script --itrace=be -F-ip \| head -1 uname 13752 [001] 4574.835589: 1 branches:uH: $ perf script --itrace=be -F-ip \| tail -1 uname 13752 [001] 4574.835867: 1 branches:uH: $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210503064222.5319-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-05-12 12:43:11 -03:00
Jin Yao	91c0f5ec81	perf record: Uniquify hybrid event name For perf-record, it would be useful to tell user the pmu which the event belongs to. For example, # perf record -a -- sleep 1 # perf report # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 106 of event 'cpu_core/cycles/' # Event count (approx.): 22043448 # # Overhead Command Shared Object Symbol # ........ ............ ....................... ............................ # ... Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-18-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-04-29 10:30:59 -03:00
Jin Yao	b53a0755d5	perf record: Create two hybrid 'cycles' events by default When evlist is empty, for example no '-e' specified in perf record, one default 'cycles' event is added to evlist. While on hybrid platform, it needs to create two default 'cycles' events. One is for cpu_core, the other is for cpu_atom. This patch actually calls evsel__new_cycles() two times to create two 'cycles' events. # ./perf record -vv -a -- sleep 1 ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|ID\|CPU\|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 21 ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|ID\|CPU\|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 27 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 28 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 29 ------------------------------------------------------------ We have to create evlist-hybrid.c otherwise due to the symbol dependency the perf test python would be failed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-04-29 10:30:59 -03:00
Arnaldo Carvalho de Melo	3535a6967c	perf record: Improve 'Workload failed' message printing events + what was exec'ed Before: # perf record -a cycles,instructions,cache-misses Workload failed: No such file or directory # After: # perf record -a cycles,instructions,cache-misses Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory # Helps disambiguating other error scenarios: # perf record -a -e cycles,instructions,cache-misses bla Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: No such file or directory # perf record -a cycles,instructions,cache-misses sleep 1 Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory # When all goes well we're back to the usual: # perf record -a -e cycles,instructions,cache-misses sleep 1 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ] # Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20210414131628.2064862-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-04-15 16:34:05 -03:00

1 2 3 4 5 ...

659 Commits