Saturday, August 16, 2008

Memory Layout of a C Program - Stack Wise



high high --------------
        |               |
        | Arguments and |
        |  environment  |
        |   variables   |
        |               |
        |---------------|
        |     Stack     |<--|--
        |(grow downward)|   |
        |               |   |User
        |               |   |Stack
        |               |   |Frame
        |               |   |
        | (grow upward) |   |( Mind the Gap )
        |      Heap     |<--|--
        |---------------|
        |      BSS      |<-- uninitialized static data(block started by symbol)
         |               |      long  sum[1000];
        |---------------|
        |      Data     |<-- initilized static data(int   maxcount = 99)
        |---------------|
        |      Code     |<-- text segment machine instructions

low
Stack : where automatic variables are stored, along with information that is
saved each time a function is called. Each time a function is called, the
address of where to return to and certain information about the caller's
environment, such as some of the machine registers, are saved on the stack. The
newly called function then allocates room on the stack for its automatic and
temporary variables. This is how recursive functions in C can work. Each time a
recursive function calls itself, a new stack frame is used, so one set of
variables doesn't interfere with the variables from another instance of the
function.
Text Segment: The text segment contains the actual code to be executed. It's
usually sharable, so multiple instances of a program can share the text segment
to lower memory requirements. This segment is usually marked read-only so a
program can't modify its own instructions.
Initialized Data Segment: This segment contains global variables which are
initialized by the programmer.
Uninitialized Data Segment: Also named "bss" (block started by symbol) which
was an operator used by an old assembler. This segment contains uninitialized
global variables. All variables in this segment are initialized to 0 or NULL
pointers before the program begins to execute.
The stack: The stack is a collection of stack frames which will be described in
the next section. When a new frame needs to be added (as a result of a newly
called function), the stack grows downward.
Every time a function is called, an area of memory is set aside, called a stack frame,
for the new function call. This area of memory holds some crucial information, like:
1. Storage space for all the automatic variables for the newly called function.
2. The line number of the calling function to return to when the called function
returns.
3. The arguments, or parameters, of the called function.
The heap: Most dynamic memory, whether requested via C's malloc() and friends
or C++'s new is doled out to the program from the heap. The C library also gets
dynamic memory for its own personal workspace from the heap as well. As more
memory is requested "on the fly", the heap grows upward.

Friday, August 15, 2008

BIG / LITTLE ENDIANESS - Interpreting Data

Now let's do an example with multi-byte data (finally!). Quick review: a "short int" is a 2-byte (16-bit) number, which can range from 0 - 65535 (if unsigned). Let's use it in an example:
short *s; // pointer to a short int (2 bytes)
s = 0; // point to location 0; *s is the value
So, s is a pointer to a short, and is now looking at byte location 0 (which has W). What happens when we read the value at s?
* Big endian machine: I think a short is two bytes, so I'll read them off: location s is address 0 (W, or 0x12) and locaiton s + 1 is address 1 (X, or 0x34). Since the first byte is biggest (I'm big-endian!), the number must be 256 * byte 0 + byte 1, or 256*W + X, or 0x1234. I multiplied the first byte by 256 (2^8) because I needed to shift it over 8 bits.
* Little endian machine: I don't know what Mr. Big Endian is smoking. Yeah, I agree a short is 2 bytes, and I'll read them off just like him: location s is 0x12, and location s + 1 is 0x34. But in my world, the first byte is the littlest! The value of the short is byte 0 + 256 * byte 1, or 256*X + W, or 0x3412.

Keep in mind that both machines start from location s and read memory going upwards. There is no confusion about what location 0 and location 1 mean. There is no confusion that a short is 2 bytes.But do you see the problem? The big-endian machine thinks s = 0x1234 and the little-endian machine thinks s = 0x3412. The same exact data gives two different numbers. Probably not a good thing.
Test - BIG / LITTLE Endianness of your system ...
FindLittleOrBig()
{
int i = 0x12345678;
if ( *(char *)&i == 0x12 )
printf(“Big endian\n”);
else if ( *(char *)&i == 0x78 )
printf(“Little endian\n”);
}

Another way to test
#include "stdio.h"
int main()
{
union {
short s;
char c[sizeof(short)];
} un;
un.s = 0x0102;
if(sizeof(short) == 2)
{
if(un.c[0] == 1 && un.c[1] == 2)
printf("big-endian\n");
else if(un.c[0] == 2 && un.c[1] == 1)
printf("little-endian\n");
else
printf("unknown\n");
}
else
{
printf("sizeof(short) = %d\n", sizeof(short));
}
return(0);
}