gcc assembler output of printf arg list

Monday, 28 August 2017

gcc assembler output of printf arg list

I learned MIPS assembly in a systems-level programming course last semester, and have been looking into the Intel and AMD architectures now.

I was having trouble trying to write a simple x86_64 program in GAS that calls printf and prints argc, and argv[0-4]. To help me understand how to do it correctly, I used the "gcc -S" to look at the assembler for the C source file "test.c":

#include 
int main (int argc, char * argv[]) {

    printf("%d,%s,%s,%s,%s,%s\n", argc, argv[0], argv[1], argv[2], argv[3], argv[4]);
return 0;
}

The output of "gcc -S -masm=intel test.c" was:

    .file   "test.c"
    .intel_syntax noprefix
    .section    .rodata

.LC0:
    .string "%d,%s,%s,%s,%s,%s\n"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    push    rbp
    .cfi_def_cfa_offset 16

    .cfi_offset 6, -16
    mov rbp, rsp
    .cfi_def_cfa_register 6
    sub rsp, 32
    mov DWORD PTR [rbp-4], edi
    mov QWORD PTR [rbp-16], rsi
    mov rax, QWORD PTR [rbp-16]
    add rax, 32
    mov rsi, QWORD PTR [rax]
    mov rax, QWORD PTR [rbp-16]

    add rax, 24
    mov r8, QWORD PTR [rax]
    mov rax, QWORD PTR [rbp-16]
    add rax, 16
    mov rdi, QWORD PTR [rax]
    mov rax, QWORD PTR [rbp-16]
    add rax, 8
    mov rcx, QWORD PTR [rax]
    mov rax, QWORD PTR [rbp-16]
    mov rdx, QWORD PTR [rax]

    mov eax, DWORD PTR [rbp-4]
    mov QWORD PTR [rsp], rsi
    mov r9, r8
    mov r8, rdi
    mov esi, eax
    mov edi, OFFSET FLAT:.LC0
    mov eax, 0
    call    printf
    mov eax, 0
    leave

    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (GNU) 4.7.2 20121109 (Red Hat 4.7.2-8)"
    .section    .note.GNU-stack,"",@progbits

To be honest, I don't think I completely understand what is going on in lines 18-38. To me, it looks like gcc stores a pointer to argc and argv[0] at [rbp-4] and [rbp-16], then loads the [rbp-16] into rax as a basepoint (pointer to argv[0]), and adds 8,16,24,... to make rax point to argv[1,2,3,...], and then loads that address into the appropriate register to pass to printf.

With that interpretation, I was able to understand enough about how the command line arguments are passed to main() to be able to fix my GAS code to this:

.intel_syntax noprefix
.globl  main

.data
fmt:    .asciz  "%d,%s,%s,%s,%s,%s\n"

.text

main:
    push    rbp
    mov     rbp, rsp

    mov     rdx, QWORD PTR [rsi]
    mov     rcx, QWORD PTR [rsi+8]
    mov     r8, QWORD PTR [rsi+16]
    mov     r9, QWORD PTR [rsi+24]
    push    [rsi+32]
    mov     rsi, rdi

    mov     rdi, offset fmt
    xor     rax, rax
    call    printf

return:
    mov     rsp, rbp
    pop     rbp
    xor     rax, rax
    ret

This produces the same output as the test.c does, as well as the gcc-generated test.s. So my question is this.... Is there anything wrong with the way I did it? And if not, why would gcc generate such a complicated way to do something that is this simple? Maybe it's just the way the compiler interprets the use of arrays?

I suppose my way is technically correct since it produces the same output, but I want to make sure it is an "acceptable" way to do it.

Blog

Monday, 28 August 2017