127 lines
4.7 KiB
Plaintext
127 lines
4.7 KiB
Plaintext
|
kmemtrace - Kernel Memory Tracer
|
||
|
|
||
|
by Eduard - Gabriel Munteanu
|
||
|
<eduard.munteanu@linux360.ro>
|
||
|
|
||
|
I. Introduction
|
||
|
===============
|
||
|
|
||
|
kmemtrace helps kernel developers figure out two things:
|
||
|
1) how different allocators (SLAB, SLUB etc.) perform
|
||
|
2) how kernel code allocates memory and how much
|
||
|
|
||
|
To do this, we trace every allocation and export information to the userspace
|
||
|
through the relay interface. We export things such as the number of requested
|
||
|
bytes, the number of bytes actually allocated (i.e. including internal
|
||
|
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
|
||
|
on.
|
||
|
|
||
|
The actual analysis is performed by a userspace tool (see section III for
|
||
|
details on where to get it from). It logs the data exported by the kernel,
|
||
|
processes it and (as of writing this) can provide the following information:
|
||
|
- the total amount of memory allocated and fragmentation per call-site
|
||
|
- the amount of memory allocated and fragmentation per allocation
|
||
|
- total memory allocated and fragmentation in the collected dataset
|
||
|
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
|
||
|
|
||
|
Moreover, it can potentially find inconsistent and erroneous behavior in
|
||
|
kernel code, such as using slab free functions on kmalloc'ed memory or
|
||
|
allocating less memory than requested (but not truly failed allocations).
|
||
|
|
||
|
kmemtrace also makes provisions for tracing on some arch and analysing the
|
||
|
data on another.
|
||
|
|
||
|
II. Design and goals
|
||
|
====================
|
||
|
|
||
|
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
|
||
|
the relay interface to export whatever is logged to userspace, which then
|
||
|
stores it. Analysis and reporting is done asynchronously, that is, after the
|
||
|
data is collected and stored. By design, it allows one to log and analyse
|
||
|
on different machines and different arches.
|
||
|
|
||
|
As of writing this, the ABI is not considered stable, though it might not
|
||
|
change much. However, no guarantees are made about compatibility yet. When
|
||
|
deemed stable, the ABI should still allow easy extension while maintaining
|
||
|
backward compatibility. This is described further in Documentation/ABI.
|
||
|
|
||
|
Summary of design goals:
|
||
|
- allow logging and analysis to be done across different machines
|
||
|
- be fast and anticipate usage in high-load environments (*)
|
||
|
- be reasonably extensible
|
||
|
- make it possible for GNU/Linux distributions to have kmemtrace
|
||
|
included in their repositories
|
||
|
|
||
|
(*) - one of the reasons Pekka Enberg's original userspace data analysis
|
||
|
tool's code was rewritten from Perl to C (although this is more than a
|
||
|
simple conversion)
|
||
|
|
||
|
|
||
|
III. Quick usage guide
|
||
|
======================
|
||
|
|
||
|
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
|
||
|
CONFIG_KMEMTRACE).
|
||
|
|
||
|
2) Get the userspace tool and build it:
|
||
|
$ git clone git://repo.or.cz/kmemtrace-user.git # current repository
|
||
|
$ cd kmemtrace-user/
|
||
|
$ ./autogen.sh
|
||
|
$ ./configure
|
||
|
$ make
|
||
|
|
||
|
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
|
||
|
'single' runlevel (so that relay buffers don't fill up easily), and run
|
||
|
kmemtrace:
|
||
|
# '$' does not mean user, but root here.
|
||
|
$ mount -t debugfs none /sys/kernel/debug
|
||
|
$ mount -t proc none /proc
|
||
|
$ cd path/to/kmemtrace-user/
|
||
|
$ ./kmemtraced
|
||
|
Wait a bit, then stop it with CTRL+C.
|
||
|
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
|
||
|
# overrun, should
|
||
|
# be zero.
|
||
|
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
|
||
|
check its correctness]
|
||
|
$ ./kmemtrace-report
|
||
|
|
||
|
Now you should have a nice and short summary of how the allocator performs.
|
||
|
|
||
|
IV. FAQ and known issues
|
||
|
========================
|
||
|
|
||
|
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
|
||
|
this? Should I worry?
|
||
|
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
|
||
|
large the number is. You can fix it by supplying a higher
|
||
|
'kmemtrace.subbufs=N' kernel parameter.
|
||
|
---
|
||
|
|
||
|
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
|
||
|
A: This is a bug and should be reported. It can occur for a variety of
|
||
|
reasons:
|
||
|
- possible bugs in relay code
|
||
|
- possible misuse of relay by kmemtrace
|
||
|
- timestamps being collected unorderly
|
||
|
Or you may fix it yourself and send us a patch.
|
||
|
---
|
||
|
|
||
|
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
|
||
|
A: This is a known issue and I'm working on it. These might be true errors
|
||
|
in kernel code, which may have inconsistent behavior (e.g. allocating memory
|
||
|
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
|
||
|
out this behavior may work with SLAB, but may fail with other allocators.
|
||
|
|
||
|
It may also be due to lack of tracing in some unusual allocator functions.
|
||
|
|
||
|
We don't want bug reports regarding this issue yet.
|
||
|
---
|
||
|
|
||
|
V. See also
|
||
|
===========
|
||
|
|
||
|
Documentation/kernel-parameters.txt
|
||
|
Documentation/ABI/testing/debugfs-kmemtrace
|
||
|
|