What is Stack Depth Analysis?
Calling a function or handling an interrupt causes stack memory to be allocated. If the stack memory region -- the portion of a mote's RAM reserved for the stack -- is not large enough to hold the stack, a stack overflow occurs. Stack overflow leads to corrupted RAM and subsequently to difficult, non-deterministic node failures. These failures cannot be reproduced in TOSSIM, though they can be found in a CPU-level simulator such as Avrora or MSPsim.
We call an application stack safe if every stack memory region is large enough to contain the worst-case size of its stack. A TinyOS application that does not use TOSThreads has a single stack and its size can be computed like this:
- stack region size = RAM size - (data segment size + BSS segment size)
Within the stack memory region, these inequalities hold:
- worst observed stack depth <= true worst case stack depth <= worst case depth estimated by static analysis
The worst observed stack depth is the largest stack size seen during testing. Testing methods include observing the maximum extent of the stack using a simulator, periodically printing the value of the stack pointer from an interrupt handler, or filling the stack with known values and seeing how many get overwritten. All testing-based techniques are unsafe in the sense that they can only provide a lower bound on the system's worst case stack memory usage. The true worst case stack depth of a computer program is uncomputable; the best we can do is to try to narrow the gap between the lower and upper bounds. Stack depth analysis -- the subject of this document -- is a technique for computing a safe upper bound on the worst-case stack memory requirements of an application.
Note: Stack overflow -- as the term is used here -- is a totally different problem from buffer overflow (aka buffer overrun, stack-based buffer overflow). The latter can be prevented using standard safe language techniques.
To be sure, many of the applications from tinyos-2.x/apps are totally safe from stack overflow. On the other hand, here are some concrete situations in which you should care about worst-case stack memory usage:
- If your application is short on RAM (for example, it statically allocates 3.5 KB or more on a MicaZ) and you are seeing spurious node failures, stack overflow should be on your short list of potential root causes.
- If your application could benefit from additional RAM allocation, for example to create deeper packet queues or larger data buffers, stack depth analysis offers a quick way to compute how much extra RAM you may safely allocate.
- If your application uses a heap, stack depth analysis makes it easy to choose a safe upper bound on the heap region.
- If your application uses TOSThreads, stack depth analysis makes it easy to choose thread stack sizes (not yet supported-- but soon).
Every computer program has a memory model. For example, a typical UNIX process has a stack that grows down from high memory and a heap that grows up from low memory. In practice, most UNIX programmers do not need to care about the details of this memory model because PCs have plenty of memory and in any case, the virtual memory system gives each process its own 32- or 64-bit address space.
In the most basic case (no heap, no threads) the memory model for a TinyOS application on a platform with 4 KB of RAM (mica2 or MicaZ) looks like this:
The size of each region is immediately apparent. The only remaining question is: Is the stack memory region large enough to contain the actual stack? Let's look at stack allocation in TinyOS in a bit more detail. Ignoring the stack used by main(), the first main consumer of stack memory is tasks:
At any time at most one task may be using the stack:
Interrupts use stack memory on top of the current task memory consumption:
An atomic IRQ runs with interrupts disabled. Therefore, it is not possible for another IRQ to stack on top of an atomic IRQ. However, non-atomic IRQs can stack up:
When TOSThreads is used, each thread gets its own stack. In the current design, interrupts use the current thread stack:
An alternative design that is being explored is to use a dedicated kernel stack for interrupts. In this case, when an interrupt preempts a thread, a small stub is pushed onto the thread's stack, but the bulk of interrupt allocation is on a centralized kernel stack. This saves RAM.
A Stack Depth Analysis Tool for TinyOS
A stack depth checking tool is available from the TinyOS CVS repository, and will be available in the next release after TinyOS 2.1. You can get this tool either by checking out the entire repository or by using this direct link into the CVS tree. The remainder of this tutorial will assume that tos-ramsize is in your path. You could, for example, update to the CVS HEAD and then run "make install" in tinyos-2.x/tools.
Running tos-ramsize Directly
Given an application for one of the AVR platforms, you can run a command like this:
[regehr@babel BaseStation]$ tos-ramsize micaz ./build/micaz/main.exe
The result should be something like:
BSS segment size is 1708, data segment size is 16 The upper bound on stack size is 538 The upper bound on RAM usage is 2262 There are 1834 unused bytes of RAM
Here the tool is telling us that for the BaseStation application on the MicaZ platform, approximately 1.8 KB of RAM is free and could have been allocated for packet buffers or some other purpose.
Running tos-ramsize from the Build System
Run a command like this:
[regehr@babel MultihopOscilloscope]$ make micaz stack-check
The result should be something like:
mkdir -p build/micaz compiling MultihopOscilloscopeAppC to a micaz binary ncc -o build/micaz/main.exe -Os -fnesc-separator=__ -Wall -Wshadow -Wnesc-all -target=micaz -fnesc-cfile=build/micaz/app.c -board=micasb -DDEFINED_TOS_AM_GROUP=0x22 --param max-inline-insns-single=100000 -I/home/regehr/z/tinyos-2.x/tos/lib/net/ -I/home/regehr/z/tinyos-2.x/tos/lib/net/ctp -I/home/regehr/z/tinyos-2.x/tos/lib/net/4bitle -DIDENT_APPNAME=\"MultihopOscillo\" -DIDENT_USERNAME=\"regehr\" -DIDENT_HOSTNAME=\"babel\" -DIDENT_USERHASH=0xaa57ee96L -DIDENT_TIMESTAMP=0x49e3a884L -DIDENT_UIDHASH=0xb4560dc8L -fnesc-dump=wiring -fnesc-dump='interfaces(!abstract())' -fnesc-dump='referenced(interfacedefs, components)' -fnesc-dumpfile=build/micaz/wiring-check.xml MultihopOscilloscopeAppC.nc -lm BSS segment size is 3421, data segment size is 24 The upper bound on stack size is 578 The upper bound on RAM usage is 4023 There are 73 unused bytes of RAM compiled MultihopOscilloscopeAppC to build/micaz/main.exe 25458 bytes in ROM 3445 bytes in RAM avr-objcopy --output-target=srec build/micaz/main.exe build/micaz/main.srec avr-objcopy --output-target=ihex build/micaz/main.exe build/micaz/main.ihex writing TOS image
Here, tos-ramsize is telling us that only 73 bytes of memory are available.
Getting More Information
By default, tos-ramsize runs at a verbosity level of 1, giving the output shown above. By specifying higher levels, you can get more detailed information. For example:
[regehr@babel BaseStation]$ tos-ramsize -verbosity=2 micaz ./build/micaz/main.exe analyzing elf file './build/micaz/main.exe' for platform 'micaz' there are: 112 labels 6252 instructions per-vector results: vector 0 max depth = 127 (not atomic) vector 1 max depth = 5 (atomic) vector 2 max depth = 5 (atomic) vector 3 max depth = 5 (atomic) vector 4 max depth = 5 (atomic) vector 5 max depth = 5 (atomic) vector 6 max depth = 5 (atomic) vector 7 max depth = 39 (atomic) vector 8 max depth = 5 (atomic) vector 11 max depth = 45 (not atomic) vector 12 max depth = 125 (not atomic) vector 13 max depth = 5 (not atomic) vector 14 max depth = 9 (not atomic) vector 15 max depth = 26 (not atomic) vector 16 max depth = 6 (atomic) vector 17 max depth = 132 (atomic) vector 18 max depth = 26 (atomic) vector 20 max depth = 28 (not atomic) vector 24 max depth = 5 (not atomic) vector 25 max depth = 7 (not atomic) vector 26 max depth = 5 (not atomic) vector 27 max depth = 5 (not atomic) vector 28 max depth = 5 (not atomic) vector 29 max depth = 9 (not atomic) vector 30 max depth = 6 (atomic) vector 32 max depth = 5 (not atomic) BSS segment size is 1708, data segment size is 16 The upper bound on stack size is 538 The upper bound on RAM usage is 2262 There are 1834 unused bytes of RAM
Here tos-ramsize has printed the contribution to worst-case stack memory usage of each interrupt vector, and also has told us whether it believes each vector to be atomic (runs with interrupts disabled) or nonatomic. At level 3 the callgraph and the stack memory usage of each function is printed. Higher levels become increasingly useless except for debugging tos-ramsize.
How Stack Analysis Works
Computing the worst-case stack memory used by a function is generally pretty simple: the pushes and pops along each path are added up. A consequence is that each function and interrupt must have zero net stack effect at its return point; tos-ramsize will crash with an error if this appears not to be the case.
The worst case stack depth of a collection of sequential code is equivalent to the longest path through the callgraph where edge weights are determined by per-function stack usage at callsites, with additional edges added to the graph to account for stack memory usage of leaf functions.
To put the sequential results together to create a global result, tos-ramsize assumes that the worst-case stack memory usage of an application occurs when:
- main() is preempted by a non-atomic interrupt at a point where it is using its maximal amount of stack memory
- the non-atomic interrupt is preempted by another nonatomic interrupt at its point of maximum stack usage, and so on
- finally, an atomic interrupt arrives
Thus, an application's worst-case stack depth is the stack usage of main() plus the sum of the stack memory used by all nonatomic interrupt handlers plus the maximum stack memory used by any atomic interrupt handler. The assumption is that interrupt handlers may nest but that they may not reenter. These assumptions are discussed in more detail below.
One might argue that the worst-case stack depth has a low probability of being reached in practice. In general, when multiple interrupt handlers are involved, this is true. The wrong way to reclaim some RAM is to optimistically assume that the worst case will never happen. The right ways to reclaim some RAM are discussed below.
In some cases, tos-ramsize really is overly conservative. For example:
- Two interrupt handlers may never be runnable at the same time, for example because they service hardware devices that cannot be used concurrently.
- A nonatomic interrupt may only enable interrupts just before returning, at a point where it is using almost no stack memory.
- Some functions and/or interrupt handlers may be dead code. They will contribute to estimated stack depth but not to the actual worst-case stack depth.
In general, dealing with these issues is beyond the scope of tos-ramsize, which is intended to be simple. Of course, you should feel free to provide patches that increase its precision, or parse its output and put the results together in new ways that permit more precise analysis.
The correctness of tos-ramsize's results is predicated on some important assumptions.
No Reentrant Interrupts
A reentrant interrupt is one that may have multiple instances on the stack at the same time. For example, consider the following scenario where your application requests high-frequency timer interrupts. First, the timer device requests an interrupt, causing your handler to run. Second, your handler re-enables interrupts. Third, either your handler runs for longer than expected or else a second interrupt handler preempts the timer handler. Finally, the timer device requests its next interrupt, causing the timer handler to be reentered.
tos-ramsize assumes that this scenario never happens. There are three ways to avoid reentrant interrupts:
- Make all interrupt handlers atomic, meaning that they do not enable interrupts at any point. TinyOS for MSP430 takes this option.
- For every non-atomic interrupt handler in your application, ensure that it returns-from-interrupt before the same interrupt source becomes pending again. In other words, prove that the maximum execution time of the interrupt handler -- including the maximum execution time of all interrupts that may preempt it -- is smaller than the minimum interarrival time of that interrupt. These proofs are usually not easy.
- Use a platform that supports prioritized interrupts in a reasonable fashion. The AVR does not: its prioritization scheme only applies to pending interrupts, not executing interrupts.
Control Flow Integrity
Basically, every function must return control to the instruction that follows the call that invoked it. This is normally the case. Control flow integrity may be violated if the compiler or application is buggy. In the most common scenario, a pointer or array bug in the application permits a return address on the stack to be accidentally or deliberatively overwritten. To avoid this problem you can write correct code or you can use Safe TinyOS.
Interrupts Are Correctly Classified as Atomic/Nonatomic
tos-ramsize attempts to classify interrupts as atomic or non-atomic. As previously mentioned, an interrupt is non-atomic if on at least one execution path it runs at least one instruction with interrupts enabled. tos-ramsize marks an interrupt handler as non-atomic if the sei instruction is executed in any function reachable from that interrupt handler. Stores to SREG, on the other hand, are assumed to be part of nesc_atomic_end() calls-- which only enable interrupts if they were previously enabled.
No Hidden Stack Pointer Updates
tos-ramsize cannot predict worst-case stack depth unless it can correctly interpret all manipulation of the stack pointer. On the AVR architecture, indirect stores can be used to change the SP. The compiler will not generate code that does this, nor should you write code that does.
Common Failure Modes
tos-ramsize tries to die if it detects a condition that prevents successful stack depth analysis. These include the following.
"cannot estimate stack depth due to recursive loop"
Recursion can sneak into a TinyOS application in several ways. First, an application can call a library function such as quicksort(). Second, component programming errors such as signaling events from commands can easily lead to recursive loops. tos-ramsize checks for recursive loops. If the callgraph is cyclic, the shortest cycle is printed and the tool terminates. The callgraph cycle that is printed will be in terms of the compiled code where function inlining has taken place, so a bit of digging will be required in order to find the source-level recursive loop. tos-ramsize must abort when recursion is detected because it cannot predict the maximum number of times that loop will be traversed.
All-pairs shortest path is, for practical purposes, O(n^3), making recursion checking the slowest part of tos-ramsize.
Recommended action: Remove the recursive loop from your application. If the loop is in TinyOS, ask the subsystem maintainer to remove it. If the loop is in the C library, send a bug report to tinyos-help to get an exception added to tos-ramsize for this situation.
"cannot process raw indirect call/jump"
tos-ramsize must know the entire set of possible targets for every call or jump that it encounters in the object code. This is only challenging for indirect calls/jumps. Some forms of indirect control flow, such as indirect jumps that come from compiling switches and indirect calls in certain library functions, are recognized as special cases. However, when an unknown indirect call/jump is found, the tool must terminate.
Recommended action: If the indirect call/jump is your fault (for example, due to use of a function pointer or setjmp/longjmp), stop using that construct. If the indirect call/jump is not your fault, report a bug.
Reducing Stack Memory Usage
As a programmer, there are a few easy things you can do to ensure that stack memory is not wasted:
- Make interrupt handlers atomic whenever possible. This is good programming practice anyway.
- Run as little code as possible in interrupt context. Tasks use stack memory more efficiently than interrupts do. This is also good programming practice.
- Avoid declaring large structs and arrays on the stack. Of course, declaring them as global simply shifts the cost around; algorithmic changes may be required to save memory.
- Avoid excessively long call chains, but note that gcc is often very effective at making these go away through inlining. An important way to break up a long call chain in TinyOS is to post a task to do something, rather than calling the code directly.
- Understand the stack memory requirements of library code called by your application. tos-ramsize can tell you about this if you crank up the verbosity. Or just run "objdump -d" on your elf file and look at the library code directly.
- Avoid creating an excessive number of threads, if using TOSThreads. Unlike tasks, which share stack memory, threads have separate stacks. Furthermore, each thread requires a large enough stack to support a full nesting of interrupts without overflowing onto the next thread's stack.
- Only the AVR platforms are supported. MSP430 is not difficult to support but this is not super high priority since the MSP430 variants used by TinyOS have a relatively generous 10 KB of SRAM.
- TOSThreads is not supported. This support is highly desirable because it will permit each thread to be provided with precisely as much stack memory as it requires. However, the details are still being worked out.
Please send problem reports and other feedback to tinyos-help or tinyos-devel, as appropriate.
Stack analysis for TinyOS:
- Eliminating Stack Overflow by Abstract Interpretation
- Abstractions for Safe Concurrent Programming in Networked Embedded Systems
Stack analysis for MantisOS:
Stack overflow in general: