Month: January 2011

Skills that are Seen with the DB Machine

I just finished teaching the Oracle Exadata and Database Machine three day course for the first time in EMEA and enjoyed  the challenge of teaching to a mixed audience. The course covers the following topics about the Exadata Database machine: Exadata architecture and functionality Smart Storage operations Flash Cache Hardware 1/4, 1/2 and full rack […]

8 gdb tricks you should know

Despite its age, gdb remains an amazingly versatile and flexible tool, and
mastering it can save you huge amounts of time when trying to debug problems in
your code. In this post, I’ll share 10 tips and tricks for using GDB to debug
most efficiently.

I’ll be using the Linux kernel for examples throughout this post, not because
these examples are necessarily realistic, but because it’s a large C codebase
that I know and that anyone can download and take a look at. Don’t worry if you
aren’t familiar with Linux’s source in particular — the details of the examples
won’t matter too much.

  1. break WHERE if COND

    If you’ve ever used gdb, you almost certainly know about the “breakpoint”
    command, which lets you break at some specified point in the debugged program.

    But did you know that you can set conditional breakpoints? If you add if
    CONDITION
    to a breakpoint command, you can include an expression to be
    evaluated whenever the program reaches that point, and the program will only
    be stopped if the condition is fulfilled. Suppose I was debugging the Linux
    kernel and wanted to stop whenever init got scheduled. I could do:

    (gdb) break context_switch if next == init_task
    

    Note that the condition is evaluated by gdb, not by the debugged program, so
    you still pay the cost of the target stopping and switching to gdb every time
    the breakpoint is hit. As such, they still slow the target down in relation to
    to how often the target location is hit, not how often the condition is met.

  2. command

    In addition to conditional breakpoints, the command command lets you specify
    commands to be run every time you hit a breakpoint. This can be used for a
    number of things, but one of the most basic is to augment points in a program
    to include debug output, without having to recompile and restart the
    program. I could get a minimal log of every mmap() operation performed on a
    system using:

    (gdb) b do_mmap_pgoff 
    Breakpoint 1 at 0xffffffff8111a441: file mm/mmap.c, line 940.
    (gdb) command 1
    Type commands for when breakpoint 1 is hit, one per line.
    End with a line saying just "end".
    >print addr
    >print len
    >print prot
    >end
    (gdb)
    
  3. gdb --args

    This one is simple, but a huge timesaver if you didn’t know it. If you just
    want to start a program under gdb, passing some arguments on the command line,
    you can just build your command-line like usual, and then put “gdb –args” in
    front to launch gdb with the target program and the argument list both set:

    [~]$ gdb --args pizzamaker --deep-dish --toppings=pepperoni
    ...
    (gdb) show args
    Argument list to give program being debugged when it is started is
      " --deep-dish --toppings=pepperoni".
    (gdb) b main
    Breakpoint 1 at 0x45467c: file oven.c, line 123.
    (gdb) run
    ...
    

    I find this especially useful if I want to debug a project that has some arcane
    wrapper script that assembles lots of environment variables and possibly
    arguments before launching the actual binary (I’m looking at you,
    libtool). Instead of trying to replicate all that state and then launch gdb,
    simply make a copy of the wrapper, find the final “exec” call or similar, and
    add “gdb –args” in front.

  4. Finding source files

    I run Ubuntu, so I can download debug symbols for most of the packages on my
    system from ddebs.ubuntu.com, and I can get source
    using apt-get source. But how do I tell gdb to put the two together? If the
    debug symbols include relative paths, I can use gdb’s directory command to
    add the source directory to my source path:

    [~/src]$ apt-get source coreutils
    [~/src]$ sudo apt-get install coreutils-dbgsym
    [~/src]$ gdb /bin/ls
    GNU gdb (GDB) 7.1-ubuntu
    (gdb) list main
    1192    ls.c: No such file or directory.
        in ls.c
    (gdb) directory ~/src/coreutils-7.4/src/
    Source directories searched: /home/nelhage/src/coreutils-7.4:$cdir:$cwd
    (gdb) list main
    1192        }
    1193    }
    1194    
    1195    int
    1196    main (int argc, char **argv)
    1197    {
    1198      int i;
    1199      struct pending *thispend;
    1200      int n_files;
    1201
    

    Sometimes, however, debug symbols end up with absolute paths, such as the
    kernel’s. In that case, I can use set substitute-path to tell gdb how to
    translate paths:

    [~/src]$ apt-get source linux-image-2.6.32-25-generic
    [~/src]$ sudo apt-get install linux-image-2.6.32-25-generic-dbgsym
    [~/src]$ gdb /usr/lib/debug/boot/vmlinux-2.6.32-25-generic 
    (gdb) list schedule
    5519    /build/buildd/linux-2.6.32/kernel/sched.c: No such file or directory.
        in /build/buildd/linux-2.6.32/kernel/sched.c
    (gdb) set substitute-path /build/buildd/linux-2.6.32 /home/nelhage/src/linux-2.6.32/
    (gdb) list schedule
    5519    
    5520    static void put_prev_task(struct rq *rq, struct task_struct *p)
    5521    {
    5522        u64 runtime = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime;
    5523    
    5524        update_avg(&p->se.avg_running, runtime);
    5525    
    5526        if (p->state == TASK_RUNNING) {
    5527            /*
    5528             * In order to avoid avg_overlap growing stale when we are
    
  5. Debugging macros

    One of the standard reasons almost everyone will tell you to prefer inline
    functions over macros is that debuggers tend to be better at dealing with
    inline functions. And in fact, by default, gdb doesn’t know anything at all
    about macros, even when your project was built with debug symbols:

    (gdb) p GFP_ATOMIC
    No symbol "GFP_ATOMIC" in current context.
    (gdb) p task_is_stopped(&init_task)
    No symbol "task_is_stopped" in current context.
    

    However, if you’re willing to tell GCC to generate debug symbols specifically
    optimized for gdb, using -ggdb3, it can preserve this information:

    $ make KCFLAGS=-ggdb3
    ...
    (gdb) break schedule
    (gdb) continue
    (gdb) p/x GFP_ATOMIC
    $1 = 0x20
    (gdb) p task_is_stopped_or_traced(init_task)
    $2 = 0
    

    You can also use the macro and info macro commands to work with macros
    from inside your gdb session:

    (gdb) macro expand task_is_stopped_or_traced(init_task)
    expands to: ((init_task->state & (4 | 8)) != 0)
    (gdb) info macro task_is_stopped_or_traced
    Defined at include/linux/sched.h:218
      included at include/linux/nmi.h:7
      included at kernel/sched.c:31
    #define task_is_stopped_or_traced(task) ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
    

    Note that gdb actually knows which contexts macros are and aren’t visible, so
    when you have the program stopped inside some function, you can only access
    macros visible at that point. (You can see that the “included at” lines above
    show you through exactly what path the macro is visible).

  6. gdb variables

    Whenever you print a variable in gdb, it prints this weird $NN = before it
    in the output:

    (gdb) p 5+5
    $1 = 10
    

    This is actually a gdb variable, that you can use to reference that same
    variable any time later in your session:

    (gdb) p $1
    $2 = 10
    

    You can also assign your own variables for convenience, using set:

    (gdb) set $foo = 4
    (gdb) p $foo
    $3 = 4
    

    This can be useful to grab a reference to some complex expression or similar
    that you’ll be referencing many times, or, for example, for simplicity in
    writing a conditional breakpoint (see tip 1).

  7. Register variables

    In addition to the numeric variables, and any variables you define, gdb exposes
    your machine’s registers as pseudo-variables, including some cross-architecture
    aliases for common ones, like $sp for the the stack pointer, or $pc for the
    program counter or instruction pointer.

    These are most useful when debugging assembly code or code without debugging
    symbols. Combined with a knowledge of your machine’s calling convention, for
    example, you can use these to inspect function parameters:

    (gdb) break write if $rsi == 2
    

    will break on all writes to stderr on amd64, where the $rsi register is used
    to pass the first parameter.

  8. The x command

    Most people who’ve used gdb know about the print or p command,
    because of its obvious name, but I’ve been surprised how many don’t
    know about the power of the x command.

    x (for “examine”) is used to output regions of
    memory in various formats. It takes two arguments in a slightly
    unusual syntax:

    x/FMT ADDRESS
    

    ADDRESS, unsurprisingly, is the address to examine; It can be an
    arbitrary expression, like the argument to print.

    FMT controls how the memory should be dumped, and consists of (up
    to) three components:

    • A numeric COUNT of how many elements to dump
    • A single-character FORMAT, indicating how to interpret and display each element
    • A single-character SIZE, indicating the size of each element to display.

    x displays COUNT elements of length SIZE each, starting from
    ADDRESS, formatting them according to the FORMAT.

    There are many valid “format” arguments; help x in gdb will give
    you the full list, so here’s my favorites:

    x/x displays elements in hex, x/d displays them as signed
    decimals, x/c displays characters, x/i disassembles memory as
    instructions, and x/s interprets memory as C strings.

    The SIZE argument can be one of: b, h, w, and g, for one-,
    two-, four-, and eight-byte blocks, respectively.

    If you have debug symbols so that GDB knows the types of everything
    you might want to inspect, p is usually a better choice, but if
    not, x is invaluable for taking a look at memory.

    [~]$ grep saved_command /proc/kallsyms
    ffffffff81946000 B saved_command_line
    
    
    (gdb) x/s 0xffffffff81946000
    ffffffff81946000 <>:     "root=/dev/sda1 quiet"
    

    x/i is invaluable as a quick way to disassemble memory:

    (gdb) x/5i schedule
       0xffffffff8154804a <schedule>:   push   %rbp
       0xffffffff8154804b <schedule+1>: mov    $0x11ac0,%rdx
       0xffffffff81548052 <schedule+8>: mov    %gs:0xb588,%rax
       0xffffffff8154805b <schedule+17>:    mov    %rsp,%rbp
       0xffffffff8154805e <schedule+20>:    push   %r15
    

    If I’m stopped at a segfault in unknown code, one of the first
    things I try is something like x/20i $ip-40, to get a look at what
    the code I’m stopped at looks like.

    A quick-and-dirty but surprisingly effective way to debug memory
    leaks is to let the leak grow until it consumes most of a program’s
    memory, and then attach gdb and just x random pieces of
    memory. Since the leaked data is using up most of memory, you’ll
    usually hit it pretty quickly, and can try to interpret what it must
    have come from.

~nelhage


Ksplice is hiring!

Do you love tinkering with, exploring, and debugging Linux systems? Does writing Python clones of your favorite childhood computer games sound like a fun weekend project? Have you ever told a joke whose punch line was a git command?

Join Ksplice and work on technology that most people will tell you is impossible: updating the Linux kernel while it is running.

Help us develop the software and infrastructure to bring rebootless kernel updates to Linux, as well as new operating system kernels and other parts of the software stack. We’re hiring backend, frontend, and kernel engineers. Say hello at jobs@ksplice.com!

Renaming ASM Listener in 11gR2

HP-UX, Oralce 11g 11.2.0.1.0, Grid Infrastructure, Oracle Restart After renaming ASM listener in Grid Infrastructure Oracle Restart mode, the dependency of the ora.asm resource not updated by new listener resource, even if You use netca tool. For example, after renaming the listener from default LISTENER to ASM_LISTENER using netca, the new resource ora.ASM_LISTENER.lsnr is successfully […]

Adaptive Cursor Sharing Test (11.2)

There are 1000000 rows in SST.CARD table, column CARD.C01 have two distinct values: 0.1 and 0.
The cardinality of 0.1 is 1000, 0 is 999000 respectively. Statistics with histogram is fresh.
Bind variable here will be “:id”, when :id is 0.1 and you use “c01=:id” predicate the Index Scan SHOULD be used, for “:id” eqauls 0 the Table Scan SHOULD be used.

# 1st run, try “c01=0.1”; plan uses Index Scan.
SYS@br//scripts> variable id number
SYS@br//scripts> exec :id:=0.1
SYS@br//scripts> select * from sst.card where c01=:id;

SubZero_E1(0.1).jpeg

Plan_E1(0.1).jpeg

Oracle 11gR2 + ASM + spfile (eng)

Oracle 11gR2 have many new futures in Grid Computing area. One of them is Oracle Local Repository (OLR), this repository designed to store information and profiles for local resources, resources that dedicated to particular node. It improves the performance of accessing local resources profile information, redundancy and manageability. In Grid Infrastructure RAC configuration there is […]

Describing Performance Improvements (Beware of Ratios)

Recently, I received into my Spam folder an ad claiming that a product could “…improve performance 1000%.” Claims in that format have bugged me for a long time, at least as far back as the 1990s, when some of the most popular Oracle “tips & techniques” books of the era used that format a lot to state claims.

Beware of claims worded like that.

Whenever I see “…improve performance 1000%,” I have to do extra work to decode what the author has encoded in his tidy numerical package with a percent-sign bow. The two performance improvement formulas that make sense to me are these:

  1. Improvement = (ba)/b, where b is the response time of the task before repair, and a is the response time of the task after repair. This formula expresses the proportion (or percentage, if you multiply by 100%) of the original response time that you have eliminated. It can’t be bigger than 1 (or 100%) without invoking reverse time travel.
  2. Improvement = b/a, where b and a are defined exactly as above. This formula expresses how many times faster the after response time is than the before one.

Since 1000% is bigger than 100%, it can’t have been calculated using formula #1. I assume, then, that when someone says “…improve performance 1000%,” he means that b/a = 10, which, expressed as a percentage, is 1000%. What I really want to know, though, is what were b and a? Were they 1000 and 1? 1 and .001? 6 and .4? (…In which case, I would have to search for a new formula #3.) Why won’t you tell me?

Any time you see a ‘%’ character, beware: you’re looking at a ratio. The principal benefit of ratios is also their biggest flaw. A ratio conceals its denominator. That, of course, is exactly what ratios are meant to do—it’s called normalization—but it’s not always good to normalize. Here’s an example. Imagine two SQL queries A and B that return the exact same result set. What’s better: query A, with a 90% hit ratio on the database buffer cache? or query B, with a 99% hit ratio?

Query Cache hit ratio
A 90%
B 99%

As tempting as it might be to choose the query with the higher cache hit ratio, the correct answer is…

There’s not enough information given in the problem to answer. It could be either A or B, depending on information that has not yet been revealed.

Here’s why. Consider the two distinct situations listed below. Each situation matches the problem statement. For situation 1, the answer is: query B is better. But for situation 2, the answer is: query A is better, because it does far less overall work. Without knowing more about the situation than just the ratio, you can’t answer the question.

Situation 1
Query Cache lookups Cache hits Cache hit ratio
A 100 90 90%
B 100 99 99%

Situation 2
Query Cache lookups Cache hits Cache hit ratio
A 10 9 90%
B 100 99 99%

Because a ratio hides its denominator, it’s insufficient for explaining your performance results to people (unless your aim is intentionally to hide information, which I’ll suggest is not a sustainable success strategy). It is still useful to show a normalized measure of your result, and a ratio is good for that. I didn’t say you shouldn’t use them. I just said they’re insufficient. You need something more.

The best way to think clearly about performance improvements is with the ratio as a parenthetical additional interesting bit of information, as in:

  • I improved response time of T from 10s to .1s (99% reduction).
  • I improved throughput of T from 42t/s to 420t/s (10-fold increase).

There are three critical pieces of information you need to include here: the before measurement (b), the after measurement (a), and the name of the task (here, T) that you made faster. I’ve talked about b and a before, but this I’ve slipped this T thing in on you all of a sudden, haven’t I!

Even authors who give you b and a have a nasty habit of leaving off the T, which is far worse even than leaving off the before and after numbers, because it implies that using their magic has improved the performance of every task on the system by exactly the same proportion (either p% or n-fold), which is almost never true. That is because it’s rare for any two tasks on a given system to have “similar” response time profiles (defining similar in the proportional sense). For example, imagine the following quite dissimilar two profiles:

Task A
Response time Resource
100% Total
90% CPU
10% Disk I/O

Task B
Response time Resource
100% Total
90% Disk I/O
10% CPU

No single component upgrade can have equal performance improvement effects upon both these tasks. Making CPU processing 2× faster will speed up task A by 45% and task B by 5%. Likewise, making Disk I/O processing 10× faster will speed up task A by 9% and task B by 80%.

For a vendor to claim any noticeable, homogeneous improvement across the board on any computer system containing tasks A and B would be an outright lie.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569