valgrind(1) - a suite of tools for debugging and profiling programs
--tool=<toolname> [default: memcheck]
    Run the Valgrind tool called toolname, e.g. Memcheck, Cachegrind, etc.
-h --help
    Show help for all options, both for the core and for the selected tool. If the option is repeated it
    is equivalent to giving --help-debug.
--help-debug
    Same as --help, but also lists debugging options which usually are only of use to Valgrind's
    developers.
--version
    Show the version number of the Valgrind core. Tools can have their own version numbers. There is a
    scheme in place to ensure that tools only execute when the core version is one they are known to work
    with. This was done to minimise the chances of strange problems arising from tool-vs-core version
    incompatibilities.
-q, --quiet
    Run silently, and only print error messages. Useful if you are running regression tests or have some
    other automated test machinery.
-v, --verbose
    Be more verbose. Gives extra information on various aspects of your program, such as: the shared
    objects loaded, the suppressions used, the progress of the instrumentation and execution engines, and
    warnings about unusual behaviour. Repeating the option increases the verbosity level.
--trace-children=<yes|no> [default: no]
    When enabled, Valgrind will trace into sub-processes initiated via the exec system call. This is
    necessary for multi-process programs.

    Note that Valgrind does trace into the child of a fork (it would be difficult not to, since fork
    makes an identical copy of a process), so this option is arguably badly named. However, most children
    of fork calls immediately call exec anyway.
--child-silent-after-fork=<yes|no> [default: no]
    When enabled, Valgrind will not show any debugging or logging output for the child process resulting
    from a fork call. This can make the output less confusing (although more misleading) when dealing
    with processes that create children. It is particularly useful in conjunction with --trace-children=.
    Use of this option is also strongly recommended if you are requesting XML output (--xml=yes), since
    otherwise the XML from child and parent may become mixed up, which usually makes it useless.
--vgdb=<no|yes|full> [default: yes]
    Valgrind will provide "gdbserver" functionality when --vgdb=yes or --vgdb=full is specified. This
    allows an external GNU GDB debugger to control and debug your program when it runs on Valgrind. See
    ???  for a detailed description.

    If the embedded gdbserver is enabled but no gdb is currently being used, the ???  command line
    utility can send "monitor commands" to Valgrind from a shell. The Valgrind core provides a set of
    ???. A tool can optionally provide tool specific monitor commands, which are documented in the tool
    specific chapter.
--vgdb=full incurs significant performance overheads.
--vgdb-error=<number> [default: 999999999]
    Use this option when the Valgrind gdbserver is enabled with --vgdb=yes or --vgdb=full. Tools that
    report errors will wait for "number" errors to be reported before freezing the program and waiting
    for you to connect with GDB. It follows that a value of zero will cause the gdbserver to be started
    before your program is executed. This is typically used to insert GDB breakpoints before execution,
    and also works with tools that do not report errors, such as Massif.
--track-fds=<yes|no> [default: no]
    When enabled, Valgrind will print out a list of open file descriptors on exit. Along with each file
    descriptor is printed a stack backtrace of where the file was opened and any details relating to the
    file descriptor such as the file name or socket details.
--time-stamp=<yes|no> [default: no]
    When enabled, each message is preceded with an indication of the elapsed wallclock time since
    startup, expressed as days, hours, minutes, seconds and milliseconds.
--log-fd=<number> [default: 2, stderr]
    Specifies that Valgrind should send all of its messages to the specified file descriptor. The
    default, 2, is the standard error channel (stderr). Note that this may interfere with the client's
    own use of stderr, as Valgrind's output will be interleaved with any output that the client sends to
    stderr.
--log-file=<filename>
    Specifies that Valgrind should send all of its messages to the specified file. If the file name is
    empty, it causes an abort. There are three special format specifiers that can be used in the file
    name.

    %p is replaced with the current process ID. This is very useful for program that invoke multiple
    processes. WARNING: If you use --trace-children=yes and your program invokes multiple processes OR
    your program forks without calling exec afterwards, and you don't use this specifier (or the %q
    specifier below), the Valgrind output from all those processes will go into one file, possibly
    jumbled up, and possibly incomplete.

    %q{FOO} is replaced with the contents of the environment variable FOO. If the {FOO} part is
    malformed, it causes an abort. This specifier is rarely needed, but very useful in certain
    circumstances (eg. when running MPI programs). The idea is that you specify a variable which will be
    set differently for each process in the job, for example BPROC_RANK or whatever is applicable in your
    MPI setup. If the named environment variable is not set, it causes an abort. Note that in some
    shells, the { and } characters may need to be escaped with a backslash.

    %% is replaced with %.

    If an % is followed by any other character, it causes an abort.
--log-socket=<ip-address:port-number>
    Specifies that Valgrind should send all of its messages to the specified port at the specified IP
    address. The port may be omitted, in which case port 1500 is used. If a connection cannot be made to
    the specified socket, Valgrind falls back to writing output to the standard error (stderr). This
    option is intended to be used in conjunction with the valgrind-listener program. For further details,
    see the commentary in the manual.
--xml=<yes|no> [default: no]
    When enabled, the important parts of the output (e.g. tool error messages) will be in XML format
    rather than plain text. Furthermore, the XML output will be sent to a different output channel than
    the plain text output. Therefore, you also must use one of --xml-fd, --xml-file or --xml-socket to
    specify where the XML is to be sent.

    Less important messages will still be printed in plain text, but because the XML output and plain
    text output are sent to different output channels (the destination of the plain text output is still
    controlled by --log-fd, --log-file and --log-socket) this should not cause problems.

    This option is aimed at making life easier for tools that consume Valgrind's output as input, such as
    GUI front ends. Currently this option works with Memcheck, Helgrind, DRD and SGcheck. The output
    format is specified in the file docs/internals/xml-output-protocol4.txt in the source tree for
    Valgrind 3.5.0 or later.

    The recommended options for a GUI to pass, when requesting XML output, are: --xml=yes to enable XML
    output, --xml-file to send the XML output to a (presumably GUI-selected) file, --log-file to send the
    plain text output to a second GUI-selected file, --child-silent-after-fork=yes, and -q to restrict
    the plain text output to critical error messages created by Valgrind itself. For example, failure to
    read a specified suppressions file counts as a critical error message. In this way, for a successful
    run the text output file will be empty. But if it isn't empty, then it will contain important
    information which the GUI user should be made aware of.
--xml-fd=<number> [default: -1, disabled]
    Specifies that Valgrind should send its XML output to the specified file descriptor. It must be used
    in conjunction with --xml=yes.
--xml-file=<filename>
    Specifies that Valgrind should send its XML output to the specified file. It must be used in
    conjunction with --xml=yes. Any %p or %q sequences appearing in the filename are expanded in exactly
    the same way as they are for --log-file. See the description of --log-file for details.
--xml-socket=<ip-address:port-number>
    Specifies that Valgrind should send its XML output the specified port at the specified IP address. It
    must be used in conjunction with --xml=yes. The form of the argument is the same as that used by
    --log-socket. See the description of --log-socket for further details.
--xml-user-comment=<string>
    Embeds an extra user comment string at the start of the XML output. Only works when --xml=yes is
    specified; ignored otherwise.
--demangle=<yes|no> [default: yes]
    Enable/disable automatic demangling (decoding) of C++ names. Enabled by default. When enabled,
    Valgrind will attempt to translate encoded C++ names back to something approaching the original. The
    demangler handles symbols mangled by g++ versions 2.X, 3.X and 4.X.

    An important fact about demangling is that function names mentioned in suppressions files should be
    in their mangled form. Valgrind does not demangle function names when searching for applicable
    suppressions, because to do otherwise would make suppression file contents dependent on the state of
    Valgrind's demangling machinery, and also slow down suppression matching.
--num-callers=<number> [default: 12]
    Specifies the maximum number of entries shown in stack traces that identify program locations. Note
    that errors are commoned up using only the top four function locations (the place in the current
    function, and that of its three immediate callers). So this doesn't affect the total number of errors
    reported.

    The maximum value for this is 50. Note that higher settings will make Valgrind run a bit more slowly
    and take a bit more memory, but can be useful when working with programs with deeply-nested call
    chains.
--error-limit=<yes|no> [default: yes]
    When enabled, Valgrind stops reporting errors after 10,000,000 in total, or 1,000 different ones,
    have been seen. This is to stop the error tracking machinery from becoming a huge performance
    overhead in programs with many errors.
--error-exitcode=<number> [default: 0]
    Specifies an alternative exit code to return if Valgrind reported any errors in the run. When set to
    the default value (zero), the return value from Valgrind will always be the return value of the
    process being simulated. When set to a nonzero value, that value is returned instead, if Valgrind
    detects any errors. This is useful for using Valgrind as part of an automated test suite, since it
    makes it easy to detect test cases for which Valgrind has reported errors, just by inspecting return
    codes.
--show-below-main=<yes|no> [default: no]
    By default, stack traces for errors do not show any functions that appear beneath main because most
    of the time it's uninteresting C library stuff and/or gobbledygook. Alternatively, if main is not
    present in the stack trace, stack traces will not show any functions below main-like functions such
    as glibc's __libc_start_main. Furthermore, if main-like functions are present in the trace, they are
    normalised as (below main), in order to make the output more deterministic.

    If this option is enabled, all stack trace entries will be shown and main-like functions will not be
    normalised.
--fullpath-after=<string> [default: don't show source paths]
    By default Valgrind only shows the filenames in stack traces, but not full paths to source files.
    When using Valgrind in large projects where the sources reside in multiple different directories,
    this can be inconvenient.  --fullpath-after provides a flexible solution to this problem. When this
    option is present, the path to each source file is shown, with the following all-important caveat: if
    string is found in the path, then the path up to and including string is omitted, else the path is
    shown unmodified. Note that string is not required to be a prefix of the path.

    For example, consider a file named /home/janedoe/blah/src/foo/bar/xyzzy.c. Specifying
    --fullpath-after=/home/janedoe/blah/src/ will cause Valgrind to show the name as foo/bar/xyzzy.c.

    Because the string is not required to be a prefix, --fullpath-after=src/ will produce the same
    output. This is useful when the path contains arbitrary machine-generated characters. For example,
    the path /my/build/dir/C32A1B47/blah/src/foo/xyzzy can be pruned to foo/xyzzy using
    --fullpath-after=/blah/src/.

    If you simply want to see the full path, just specify an empty string: --fullpath-after=. This isn't
    a special case, merely a logical consequence of the above rules.

    Finally, you can use --fullpath-after multiple times. Any appearance of it causes Valgrind to switch
    to producing full paths and applying the above filtering rule. Each produced path is compared against
    all the --fullpath-after-specified strings, in the order specified. The first string to match causes
    the path to be truncated as described above. If none match, the full path is shown. This facilitates
    chopping off prefixes when the sources are drawn from a number of unrelated directories.
--suppressions=<filename> [default: $PREFIX/lib/valgrind/default.supp]
    Specifies an extra file from which to read descriptions of errors to suppress. You may use up to 100
    extra suppression files.
--gen-suppressions=<yes|no|all> [default: no]
    When set to yes, Valgrind will pause after every error shown and print the line:
--db-attach=<yes|no> [default: no]
    When enabled, Valgrind will pause after every error shown and print the line:
--db-command=<command> [default: gdb -nw %f %p]
    Specify the debugger to use with the --db-attach command. The default debugger is GDB. This option is
    a template that is expanded by Valgrind at runtime.  %f is replaced with the executable's file name
    and %p is replaced by the process ID of the executable.

    This specifies how Valgrind will invoke the debugger. By default it will use whatever GDB is detected
    at build time, which is usually /usr/bin/gdb. Using this command, you can specify some alternative
    command to invoke the debugger you want to use.

    The command string given can include one or instances of the %p and %f expansions. Each instance of
    %p expands to the PID of the process to be debugged and each instance of %f expands to the path to
    the executable for the process to be debugged.

    Since <command> is likely to contain spaces, you will need to put this entire option in quotes to
    ensure it is correctly handled by the shell.
--input-fd=<number> [default: 0, stdin]
    When using --db-attach=yes or --gen-suppressions=yes, Valgrind will stop so as to read keyboard input
    from you when each error occurs. By default it reads from the standard input (stdin), which is
    problematic for programs which close stdin. This option allows you to specify an alternative file
    descriptor from which to read input.
--dsymutil=no|yes [no]
    This option is only relevant when running Valgrind on Mac OS X.
--max-stackframe=<number> [default: 2000000]
    The maximum size of a stack frame. If the stack pointer moves by more than this amount then Valgrind
    will assume that the program is switching to a different stack.

    You may need to use this option if your program has large stack-allocated arrays. Valgrind keeps
    track of your program's stack pointer. If it changes by more than the threshold amount, Valgrind
    assumes your program is switching to a different stack, and Memcheck behaves differently than it
    would for a stack pointer change smaller than the threshold. Usually this heuristic works well.
    However, if your program allocates large structures on the stack, this heuristic will be fooled, and
    Memcheck will subsequently report large numbers of invalid stack accesses. This option allows you to
    change the threshold to a different value.

    You should only consider use of this option if Valgrind's debug output directs you to do so. In that
    case it will tell you the new threshold you should specify.

    In general, allocating large structures on the stack is a bad idea, because you can easily run out of
    stack space, especially on systems with limited memory or which expect to support large numbers of
    threads each with a small stack, and also because the error checking performed by Memcheck is more
    effective for heap-allocated data than for stack-allocated data. If you have to use this option, you
    may wish to consider rewriting your code to allocate on the heap rather than on the stack.
--main-stacksize=<number> [default: use current 'ulimit' value]
    Specifies the size of the main thread's stack.
--alignment=<number> [default: 8 or 16, depending on the platform]
    By default Valgrind's malloc, realloc, etc, return a block whose starting address is 8-byte aligned
    or 16-byte aligned (the value depends on the platform and matches the platform default). This option
    allows you to specify a different alignment. The supplied value must be greater than or equal to the
    default, less than or equal to 4096, and must be a power of two.
--smc-check=<none|stack|all|all-non-file> [default: stack]
    This option controls Valgrind's detection of self-modifying code. If no checking is done, if a
    program executes some code, then overwrites it with new code, and executes the new code, Valgrind
    will continue to execute the translations it made for the old code. This will likely lead to
    incorrect behaviour and/or crashes.

    Valgrind has four levels of self-modifying code detection: no detection, detect self-modifying code
    on the stack (which is used by GCC to implement nested functions), detect self-modifying code
    everywhere, and detect self-modifying code everywhere except in file-backed mappings. Note that the
    default option will catch the vast majority of cases. The main case it will not catch is programs
    such as JIT compilers that dynamically generate code and subsequently overwrite part or all of it.
    Running with all will slow Valgrind down noticeably. Running with none will rarely speed things up,
    since very little code gets put on the stack for most programs. The VALGRIND_DISCARD_TRANSLATIONS
    client request is an alternative to --smc-check=all that requires more programmer effort but allows
    Valgrind to run your program faster, by telling it precisely when translations need to be re-made.
--smc-check=all-non-file provides a cheaper but more limited version of --smc-check=all. It adds
checks to any translations that do not originate from file-backed memory mappings. Typical
applications that generate code, for example JITs in web browsers, generate code into anonymous
mmaped areas, whereas the "fixed" code of the browser always lives in file-backed mappings.
--smc-check=all-non-file takes advantage of this observation, limiting the overhead of checking to
code which is likely to be JIT generated.

Some architectures (including ppc32, ppc64 and ARM) require programs which create code at runtime to
flush the instruction cache in between code generation and first use. Valgrind observes and honours
such instructions. Hence, on ppc32/Linux, ppc64/Linux and ARM/Linux, Valgrind always provides
complete, transparent support for self-modifying code. It is only on platforms such as x86/Linux,
AMD64/Linux, x86/Darwin and AMD64/Darwin that you need to use this option.
--read-var-info=<yes|no> [default: no]
    When enabled, Valgrind will read information about variable types and locations from DWARF3 debug
    info. This slows Valgrind down and makes it use more memory, but for the tools that can take
    advantage of it (Memcheck, Helgrind, DRD) it can result in more precise error messages. For example,
    here are some standard errors issued by Memcheck:

        ==15516== Uninitialised byte(s) found during client check request
        ==15516==    at 0x400633: croak (varinfo1.c:28)
        ==15516==    by 0x4006B2: main (varinfo1.c:55)
        ==15516==  Address 0x60103b is 7 bytes inside data symbol "global_i2"
        ==15516==
        ==15516== Uninitialised byte(s) found during client check request
        ==15516==    at 0x400633: croak (varinfo1.c:28)
        ==15516==    by 0x4006BC: main (varinfo1.c:56)
        ==15516==  Address 0x7fefffefc is on thread 1's stack
--vgdb-poll=<number> [default: 5000]
    As part of its main loop, the Valgrind scheduler will poll to check if some activity (such as an
    external command or some input from a gdb) has to be handled by gdbserver. This activity poll will be
    done after having run the given number of basic blocks (or slightly more than the given number of
    basic blocks). This poll is quite cheap so the default value is set relatively low. You might further
    decrease this value if vgdb cannot use ptrace system call to interrupt Valgrind if all threads are
    (most of the time) blocked in a system call.
--vgdb-shadow-registers=no|yes [default: no]
    When activated, gdbserver will expose the Valgrind shadow registers to GDB. With this, the value of
    the Valgrind shadow registers can be examined or changed using GDB. Exposing shadow registers only
    works with GDB version 7.1 or later.
--vgdb-prefix=<prefix> [default: /tmp/vgdb-pipe]
    To communicate with gdb/vgdb, the Valgrind gdbserver creates 3 files (2 named FIFOs and a mmap shared
    memory file). The prefix option controls the directory and prefix for the creation of these files.
--run-libc-freeres=<yes|no> [default: yes]
    This option is only relevant when running Valgrind on Linux.

    The GNU C library (libc.so), which is used by all programs, may allocate memory for its own uses.
    Usually it doesn't bother to free that memory when the program ends—there would be no point, since
    the Linux kernel reclaims all process resources when a process exits anyway, so it would just slow
    things down.

    The glibc authors realised that this behaviour causes leak checkers, such as Valgrind, to falsely
    report leaks in glibc, when a leak check is done at exit. In order to avoid this, they provided a
    routine called __libc_freeres specifically to make glibc release all memory it has allocated.
    Memcheck therefore tries to run __libc_freeres at exit.

    Unfortunately, in some very old versions of glibc, __libc_freeres is sufficiently buggy to cause
    segmentation faults. This was particularly noticeable on Red Hat 7.1. So this option is provided in
    order to inhibit the run of __libc_freeres. If your program seems to run fine on Valgrind, but
    segfaults at exit, you may find that --run-libc-freeres=no fixes that, although at the cost of
    possibly falsely reporting space leaks in libc.so.
--show-emwarns=<yes|no> [default: no]
    When enabled, Valgrind will emit warnings about its CPU emulation in certain cases. These are usually
    not interesting.
--leak-check=<no|summary|yes|full> [default: summary]
    When enabled, search for memory leaks when the client program finishes. If set to summary, it says
    how many leaks occurred. If set to full or yes, it also gives details of each individual leak.
--show-possibly-lost=<yes|no> [default: yes]
    When disabled, the memory leak detector will not show "possibly lost" blocks.
--leak-resolution=<low|med|high> [default: high]
    When doing leak checking, determines how willing Memcheck is to consider different backtraces to be
    the same for the purposes of merging multiple leaks into a single leak report. When set to low, only
    the first two entries need match. When med, four entries have to match. When high, all entries need
    to match.

    For hardcore leak debugging, you probably want to use --leak-resolution=high together with
    --num-callers=40 or some such large number.

    Note that the --leak-resolution setting does not affect Memcheck's ability to find leaks. It only
    changes how the results are presented.
--show-reachable=<yes|no> [default: no]
    When disabled, the memory leak detector only shows "definitely lost" and "possibly lost" blocks. When
    enabled, the leak detector also shows "reachable" and "indirectly lost" blocks. (In other words, it
    shows all blocks, except suppressed ones, so --show-all would be a better name for it.)
--undef-value-errors=<yes|no> [default: yes]
    Controls whether Memcheck reports uses of undefined value errors. Set this to no if you don't want to
    see undefined value errors. It also has the side effect of speeding up Memcheck somewhat.
--track-origins=<yes|no> [default: no]
    Controls whether Memcheck tracks the origin of uninitialised values. By default, it does not, which
    means that although it can tell you that an uninitialised value is being used in a dangerous way, it
    cannot tell you where the uninitialised value came from. This often makes it difficult to track down
    the root problem.
--partial-loads-ok=<yes|no> [default: no]
    Controls how Memcheck handles word-sized, word-aligned loads from addresses for which some bytes are
    addressable and others are not. When yes, such loads do not produce an address error. Instead, loaded
    bytes originating from illegal addresses are marked as uninitialised, and those corresponding to
    legal addresses are handled in the normal way.

    When no, loads from partially invalid addresses are treated the same as loads from completely invalid
    addresses: an illegal-address error is issued, and the resulting bytes are marked as initialised.

    Note that code that behaves in this way is in violation of the the ISO C/C++ standards, and should be
    considered broken. If at all possible, such code should be fixed. This option should be used only as
    a last resort.
--freelist-vol=<number> [default: 20000000]
    When the client program releases memory using free (in C) or delete (C++), that memory is not
    immediately made available for re-allocation. Instead, it is marked inaccessible and placed in a
    queue of freed blocks. The purpose is to defer as long as possible the point at which freed-up memory
    comes back into circulation. This increases the chance that Memcheck will be able to detect invalid
    accesses to blocks for some significant period of time after they have been freed.

    This option specifies the maximum total size, in bytes, of the blocks in the queue. The default value
    is twenty million bytes. Increasing this increases the total amount of memory used by Memcheck but
    may detect invalid uses of freed blocks which would otherwise go undetected.
--freelist-big-blocks=<number> [default: 1000000]
    When making blocks from the queue of freed blocks available for re-allocation, Memcheck will in
    priority re-circulate the blocks with a size greater or equal to --freelist-big-blocks. This ensures
    that freeing big blocks (in particular freeing blocks bigger than --freelist-vol) does not
    immediately lead to a re-circulation of all (or a lot of) the small blocks in the free list. In other
    words, this option increases the likelihood to discover dangling pointers for the "small" blocks,
    even when big blocks are freed.

    Setting a value of 0 means that all the blocks are re-circulated in a FIFO order.
--workaround-gcc296-bugs=<yes|no> [default: no]
    When enabled, assume that reads and writes some small distance below the stack pointer are due to
    bugs in GCC 2.96, and does not report them. The "small distance" is 256 bytes by default. Note that
    GCC 2.96 is the default compiler on some ancient Linux distributions (RedHat 7.X) and so you may need
    to use this option. Do not use it if you do not have to, as it can cause real errors to be
    overlooked. A better alternative is to use a more recent GCC in which this bug is fixed.

    You may also need to use this option when working with GCC 3.X or 4.X on 32-bit PowerPC Linux. This
    is because GCC generates code which occasionally accesses below the stack pointer, particularly for
    floating-point to/from integer conversions. This is in violation of the 32-bit PowerPC ELF
    specification, which makes no provision for locations below the stack pointer to be accessible.
--malloc-fill=<hexnumber>
    Fills blocks allocated by malloc, new, etc, but not by calloc, with the specified byte. This can be
    useful when trying to shake out obscure memory corruption problems. The allocated area is still
    regarded by Memcheck as undefined -- this option only affects its contents.
--free-fill=<hexnumber>
    Fills blocks freed by free, delete, etc, with the specified byte value. This can be useful when
    trying to shake out obscure memory corruption problems. The freed area is still regarded by Memcheck
    as not valid for access -- this option only affects its contents.
--I1=<size>,<associativity>,<line size>
    Specify the size, associativity and line size of the level 1 instruction cache.
--D1=<size>,<associativity>,<line size>
    Specify the size, associativity and line size of the level 1 data cache.
--LL=<size>,<associativity>,<line size>
    Specify the size, associativity and line size of the last-level cache.
--cache-sim=no|yes [yes]
    Enables or disables collection of cache access and miss counts.
--branch-sim=no|yes [no]
    Enables or disables collection of branch instruction and misprediction counts. By default this is
    disabled as it slows Cachegrind down by approximately 25%. Note that you cannot specify
    --cache-sim=no and --branch-sim=no together, as that would leave Cachegrind with no information to
    collect.
--cachegrind-out-file=<file>
    Write the profile data to file rather than to the default output file, cachegrind.out.<pid>. The %p
    and %q format specifiers can be used to embed the process ID and/or the contents of an environment
    variable in the name, as is the case for the core option --log-file.
--callgrind-out-file=<file>
    Write the profile data to file rather than to the default output file, callgrind.out.<pid>. The %p
    and %q format specifiers can be used to embed the process ID and/or the contents of an environment
    variable in the name, as is the case for the core option --log-file. When multiple dumps are made,
    the file name is modified further; see below.
--dump-line=<no|yes> [default: yes]
    This specifies that event counting should be performed at source line granularity. This allows source
    annotation for sources which are compiled with debug information (-g).
--dump-instr=<no|yes> [default: no]
    This specifies that event counting should be performed at per-instruction granularity. This allows
    for assembly code annotation. Currently the results can only be displayed by KCachegrind.
--compress-strings=<no|yes> [default: yes]
    This option influences the output format of the profile data. It specifies whether strings (file and
    function names) should be identified by numbers. This shrinks the file, but makes it more difficult
    for humans to read (which is not recommended in any case).
--compress-pos=<no|yes> [default: yes]
    This option influences the output format of the profile data. It specifies whether numerical
    positions are always specified as absolute values or are allowed to be relative to previous numbers.
    This shrinks the file size.
--combine-dumps=<no|yes> [default: no]
    When enabled, when multiple profile data parts are to be generated these parts are appended to the
    same output file. Not recommended.
--dump-every-bb=<count> [default: 0, never]
    Dump profile data every count basic blocks. Whether a dump is needed is only checked when Valgrind's
    internal scheduler is run. Therefore, the minimum setting useful is about 100000. The count is a
    64-bit value to make long dump periods possible.
--dump-before=<function>
    Dump when entering function.
--zero-before=<function>
    Zero all costs when entering function.
--dump-after=<function>
    Dump when leaving function.
--instr-atstart=<yes|no> [default: yes]
    Specify if you want Callgrind to start simulation and profiling from the beginning of the program.
    When set to no, Callgrind will not be able to collect any information, including calls, but it will
    have at most a slowdown of around 4, which is the minimum Valgrind overhead. Instrumentation can be
    interactively enabled via callgrind_control -i on.

    Note that the resulting call graph will most probably not contain main, but will contain all the
    functions executed after instrumentation was enabled. Instrumentation can also programatically
    enabled/disabled. See the Callgrind include file callgrind.h for the macro you have to use in your
    source code.

    For cache simulation, results will be less accurate when switching on instrumentation later in the
    program run, as the simulator starts with an empty cache at that moment. Switch on event collection
    later to cope with this error.
--collect-atstart=<yes|no> [default: yes]
    Specify whether event collection is enabled at beginning of the profile run.
--toggle-collect=<function>
    Toggle collection on entry/exit of function.
--collect-jumps=<no|yes> [default: no]
    This specifies whether information for (conditional) jumps should be collected. As above,
    callgrind_annotate currently is not able to show you the data. You have to use KCachegrind to get
    jump arrows in the annotated code.
--collect-systime=<no|yes> [default: no]
    This specifies whether information for system call times should be collected.
--collect-bus=<no|yes> [default: no]
    This specifies whether the number of global bus events executed should be collected. The event type
    "Ge" is used for these events.
--cache-sim=<yes|no> [default: no]
    Specify if you want to do full cache simulation. By default, only instruction read accesses will be
    counted ("Ir"). With cache simulation, further event counters are enabled: Cache misses on
    instruction reads ("I1mr"/"ILmr"), data read accesses ("Dr") and related cache misses
    ("D1mr"/"DLmr"), data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw"). For more
    information, see ???.
--branch-sim=<yes|no> [default: no]
    Specify if you want to do branch prediction simulation. Further event counters are enabled: Number of
    executed conditional branches and related predictor misses ("Bc"/"Bcm"), executed indirect jumps and
    related misses of the jump address predictor ("Bi"/"Bim").
--free-is-write=no|yes [default: no]
    When enabled (not the default), Helgrind treats freeing of heap memory as if the memory was written
    immediately before the free. This exposes races where memory is referenced by one thread, and freed
    by another, but there is no observable synchronisation event to ensure that the reference happens
    before the free.

    This functionality is new in Valgrind 3.7.0, and is regarded as experimental. It is not enabled by
    default because its interaction with custom memory allocators is not well understood at present. User
    feedback is welcomed.
--track-lockorders=no|yes [default: yes]
    When enabled (the default), Helgrind performs lock order consistency checking. For some buggy
    programs, the large number of lock order errors reported can become annoying, particularly if you're
    only interested in race errors. You may therefore find it helpful to disable lock order checking.
--history-level=none|approx|full [default: full]
    --history-level=full (the default) causes Helgrind collects enough information about "old" accesses
    that it can produce two stack traces in a race report -- both the stack trace for the current access,
    and the trace for the older, conflicting access.

    Collecting such information is expensive in both speed and memory, particularly for programs that do
    many inter-thread synchronisation events (locks, unlocks, etc). Without such information, it is more
    difficult to track down the root causes of races. Nonetheless, you may not need it in situations
    where you just want to check for the presence or absence of races, for example, when doing regression
    testing of a previously race-free program.
--history-level=none is the opposite extreme. It causes Helgrind not to collect any information about
previous accesses. This can be dramatically faster than --history-level=full.
--history-level=approx provides a compromise between these two extremes. It causes Helgrind to show a
full trace for the later access, and approximate information regarding the earlier access. This
approximate information consists of two stacks, and the earlier access is guaranteed to have occurred
somewhere between program points denoted by the two stacks. This is not as useful as showing the
exact stack for the previous access (as --history-level=full does), but it is better than nothing,
and it is almost as fast as --history-level=none.
--conflict-cache-size=N [default: 1000000]
    This flag only has any effect at --history-level=full.

    Information about "old" conflicting accesses is stored in a cache of limited size, with LRU-style
    management. This is necessary because it isn't practical to store a stack trace for every single
    memory access made by the program. Historical information on not recently accessed locations is
    periodically discarded, to free up space in the cache.

    This option controls the size of the cache, in terms of the number of different memory addresses for
    which conflicting access information is stored. If you find that Helgrind is showing race errors with
    only one stack instead of the expected two stacks, try increasing this value.

    The minimum value is 10,000 and the maximum is 30,000,000 (thirty times the default value).
    Increasing the value by 1 increases Helgrind's memory requirement by very roughly 100 bytes, so the
    maximum value will easily eat up three extra gigabytes or so of memory.
--check-stack-refs=no|yes [default: yes]
    By default Helgrind checks all data memory accesses made by your program. This flag enables you to
    skip checking for accesses to thread stacks (local variables). This can improve performance, but
    comes at the cost of missing races on stack-allocated data.
--check-stack-var=<yes|no> [default: no]
    Controls whether DRD detects data races on stack variables. Verifying stack variables is disabled by
    default because most programs do not share stack variables over threads.
--exclusive-threshold=<n> [default: off]
    Print an error message if any mutex or writer lock has been held longer than the time specified in
    milliseconds. This option enables the detection of lock contention.
--join-list-vol=<n> [default: 10]
    Data races that occur between a statement at the end of one thread and another thread can be missed
    if memory access information is discarded immediately after a thread has been joined. This option
    allows to specify for how many joined threads memory access information should be retained.
--first-race-only=<yes|no> [default: no]
   Whether to report only the first data race that has been detected on a memory location or all data
   races that have been detected on a memory location.
--free-is-write=<yes|no> [default: no]
   Whether to report races between accessing memory and freeing memory. Enabling this option may cause
   DRD to run slightly slower. Notes:

      Don't enable this option when using custom memory allocators that use the
       VG_USERREQ__MALLOCLIKE_BLOCK and VG_USERREQ__FREELIKE_BLOCK because that would result in false
       positives.

      Don't enable this option when using reference-counted objects because that will result in false
       positives, even when that code has been annotated properly with ANNOTATE_HAPPENS_BEFORE and
       ANNOTATE_HAPPENS_AFTER. See e.g. the output of the following command for an example: valgrind
       --tool=drd --free-is-write=yes drd/tests/annotate_smart_pointer.
--report-signal-unlocked=<yes|no> [default: yes]
   Whether to report calls to pthread_cond_signal and pthread_cond_broadcast where the mutex associated
   with the signal through pthread_cond_wait or pthread_cond_timed_waitis not locked at the time the
   signal is sent. Sending a signal without holding a lock on the associated mutex is a common
   programming error which can cause subtle race conditions and unpredictable behavior. There exist some
   uncommon synchronization patterns however where it is safe to send a signal without holding a lock on
   the associated mutex.
--segment-merging=<yes|no> [default: yes]
    Controls segment merging. Segment merging is an algorithm to limit memory usage of the data race
    detection algorithm. Disabling segment merging may improve the accuracy of the so-called 'other
    segments' displayed in race reports but can also trigger an out of memory error.
--segment-merging-interval=<n> [default: 10]
    Perform segment merging only after the specified number of new segments have been created. This is an
    advanced configuration option that allows to choose whether to minimize DRD's memory usage by
    choosing a low value or to let DRD run faster by choosing a slightly higher value. The optimal value
    for this parameter depends on the program being analyzed. The default value works well for most
    programs.
--shared-threshold=<n> [default: off]
    Print an error message if a reader lock has been held longer than the specified time (in
    milliseconds). This option enables the detection of lock contention.
--show-confl-seg=<yes|no> [default: yes]
    Show conflicting segments in race reports. Since this information can help to find the cause of a
    data race, this option is enabled by default. Disabling this option makes the output of DRD more
    compact.
--show-stack-usage=<yes|no> [default: no]
    Print stack usage at thread exit time. When a program creates a large number of threads it becomes
    important to limit the amount of virtual memory allocated for thread stacks. This option makes it
    possible to observe how much stack memory has been used by each thread of the the client program.
    Note: the DRD tool itself allocates some temporary data on the client thread stack. The space
    necessary for this temporary data must be allocated by the client program when it allocates stack
    memory, but is not included in stack usage reported by DRD.
--trace-addr=<address> [default: none]
    Trace all load and store activity for the specified address. This option may be specified more than
    once.
--trace-alloc=<yes|no> [default: no]
    Trace all memory allocations and deallocations. May produce a huge amount of output.
--trace-barrier=<yes|no> [default: no]
    Trace all barrier activity.
--trace-cond=<yes|no> [default: no]
    Trace all condition variable activity.
--trace-fork-join=<yes|no> [default: no]
    Trace all thread creation and all thread termination events.
--trace-hb=<yes|no> [default: no]
    Trace execution of the ANNOTATE_HAPPENS_BEFORE(), ANNOTATE_HAPPENS_AFTER() and
    ANNOTATE_HAPPENS_DONE() client requests.
--trace-mutex=<yes|no> [default: no]
    Trace all mutex activity.
--trace-rwlock=<yes|no> [default: no]
    Trace all reader-writer lock activity.
--trace-semaphore=<yes|no> [default: no]
    Trace all semaphore activity.
--heap=<yes|no> [default: yes]
    Specifies whether heap profiling should be done.
--heap-admin=<size> [default: 8]
    If heap profiling is enabled, gives the number of administrative bytes per block to use. This should
    be an estimate of the average, since it may vary. For example, the allocator used by glibc on Linux
    requires somewhere between 4 to 15 bytes per block, depending on various factors. That allocator also
    requires admin space for freed blocks, but Massif cannot account for this.
--stacks=<yes|no> [default: no]
    Specifies whether stack profiling should be done. This option slows Massif down greatly, and so is
    off by default. Note that Massif assumes that the main stack has size zero at start-up. This is not
    true, but doing otherwise accurately is difficult. Furthermore, starting at zero better indicates the
    size of the part of the main stack that a user program actually has control over.
--pages-as-heap=<yes|no> [default: no]
    Tells Massif to profile memory at the page level rather than at the malloc'd block level. See above
    for details.
--depth=<number> [default: 30]
    Maximum depth of the allocation trees recorded for detailed snapshots. Increasing it will make Massif
    run somewhat more slowly, use more memory, and produce bigger output files.
--alloc-fn=<name>
    Functions specified with this option will be treated as though they were a heap allocation function
    such as malloc. This is useful for functions that are wrappers to malloc or new, which can fill up
    the allocation trees with uninteresting information. This option can be specified multiple times on
    the command line, to name multiple functions.

    Note that the named function will only be treated this way if it is the top entry in a stack trace,
    or just below another function treated this way. For example, if you have a function malloc1 that
    wraps malloc, and malloc2 that wraps malloc1, just specifying --alloc-fn=malloc2 will have no effect.
    You need to specify --alloc-fn=malloc1 as well. This is a little inconvenient, but the reason is that
    checking for allocation functions is slow, and it saves a lot of time if Massif can stop looking
    through the stack trace entries as soon as it finds one that doesn't match rather than having to
    continue through all the entries.

    Note that C++ names are demangled. Note also that overloaded C++ names must be written in full.
    Single quotes may be necessary to prevent the shell from breaking them up. For example:
--ignore-fn=<name>
    Any direct heap allocation (i.e. a call to malloc, new, etc, or a call to a function named by an
    --alloc-fn option) that occurs in a function specified by this option will be ignored. This is mostly
    useful for testing purposes. This option can be specified multiple times on the command line, to name
    multiple functions.

    Any realloc of an ignored block will also be ignored, even if the realloc call does not occur in an
    ignored function. This avoids the possibility of negative heap sizes if ignored blocks are shrunk
    with realloc.

    The rules for writing C++ function names are the same as for --alloc-fn above.
--threshold=<m.n> [default: 1.0]
    The significance threshold for heap allocations, as a percentage of total memory size. Allocation
    tree entries that account for less than this will be aggregated. Note that this should be specified
    in tandem with ms_print's option of the same name.
--peak-inaccuracy=<m.n> [default: 1.0]
    Massif does not necessarily record the actual global memory allocation peak; by default it records a
    peak only when the global memory allocation size exceeds the previous peak by at least 1.0%. This is
    because there can be many local allocation peaks along the way, and doing a detailed snapshot for
    every one would be expensive and wasteful, as all but one of them will be later discarded. This
    inaccuracy can be changed (even to 0.0%) via this option, but Massif will run drastically slower as
    the number approaches zero.
--time-unit=<i|ms|B> [default: i]
    The time unit used for the profiling. There are three possibilities: instructions executed (i), which
    is good for most cases; real (wallclock) time (ms, i.e. milliseconds), which is sometimes useful; and
    bytes allocated/deallocated on the heap and/or stack (B), which is useful for very short-run
    programs, and for testing purposes, because it is the most reproducible across different machines.
--detailed-freq=<n> [default: 10]
    Frequency of detailed snapshots. With --detailed-freq=1, every snapshot is detailed.
--max-snapshots=<n> [default: 100]
    The maximum number of snapshots recorded. If set to N, for all programs except very short-running
    ones, the final number of snapshots will be between N/2 and N.
--massif-out-file=<file> [default: massif.out.%p]
    Write the profile data to file rather than to the default output file, massif.out.<pid>. The %p and
    %q format specifiers can be used to embed the process ID and/or the contents of an environment
    variable in the name, as is the case for the core option --log-file.
--bb-out-file=<name> [default: bb.out.%p]
    This option selects the name of the basic block vector file. The %p and %q format specifiers can be
    used to embed the process ID and/or the contents of an environment variable in the name, as is the
    case for the core option --log-file.
--pc-out-file=<name> [default: pc.out.%p]
    This option selects the name of the PC file. This file holds program counter addresses and function
    name info for the various basic blocks. This can be used in conjunction with the basic block vector
    file to fast-forward via function names instead of just instruction counts. The %p and %q format
    specifiers can be used to embed the process ID and/or the contents of an environment variable in the
    name, as is the case for the core option --log-file.
--interval-size=<number> [default: 100000000]
    This option selects the size of the interval to use. The default is 100 million instructions, which
    is a commonly used value. Other sizes can be used; smaller intervals can help programs with
    finer-grained phases. However smaller interval size can lead to accuracy issues due to warm-up
    effects (When fast-forwarding the various architectural features will be un-initialized, and it will
    take some number of instructions before they "warm up" to the state a full simulation would be at
    without the fast-forwarding. Large interval sizes tend to mitigate this.)
--instr-count-only [default: no]
    This option tells the tool to only display instruction count totals, and to not generate the actual
    basic block vector file. This is useful for debugging, and for gathering instruction count info
    without generating the large basic block vector files.
--basic-counts=<no|yes> [default: yes]
    When enabled, Lackey prints the following statistics and information about the execution of the
    client program:
--detailed-counts=<no|yes> [default: no]
    When enabled, Lackey prints a table containing counts of loads, stores and ALU operations,
    differentiated by their IR types. The IR types are identified by their IR name ("I1", "I8", ...
    "I128", "F32", "F64", and "V128").
--trace-mem=<no|yes> [default: no]
    When enabled, Lackey prints the size and address of almost every memory access made by the program.
    See the comments at the top of the file lackey/lk_main.c for details about the output format, how it
    works, and inaccuracies in the address trace. Note that this option produces immense amounts of
    output.
--trace-superblocks=<no|yes> [default: no]
    When enabled, Lackey prints out the address of every superblock (a single entry, multiple exit,
    linear chunk of code) executed by the program. This is primarily of interest to Valgrind developers.
    See the comments at the top of the file lackey/lk_main.c for details about the output format. Note
    that this option produces large amounts of output.
--fnname=<name> [default: main]
    Changes the function for which calls are counted when --basic-counts=yes is specified.