Is inline assembly language slower than native C++ code?

Question

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:    #define TIMES 100000    void calcuC(int *x,int *y,int length)    {        for(int i = 0; i < TIMES; i++)        {            for(int j = 0; j < length; j++)                x[j] += y[j];        }    }    void calcuAsm(int *x,int *y,int lengthOfArray)    {        __asm        {            mov edi,TIMES            start:            mov esi,0            mov ecx,lengthOfArray            label:            mov edx,x            push edx            mov eax,DWORD PTR [edx + esi*4]            mov edx,y            mov ebx,DWORD PTR [edx + esi*4]            add eax,ebx            pop edx            mov [edx + esi*4],eax            inc esi            loop label            dec edi            cmp edi,0            jnz start        };    }Here's `main()`:    int main() {        bool errorOccured = false;        setbuf(stdout,NULL);        int *xC,*xAsm,*yC,*yAsm;        xC = new int[2000];        xAsm = new int[2000];        yC = new int[2000];        yAsm = new int[2000];        for(int i = 0; i < 2000; i++)        {            xC[i] = 0;            xAsm[i] = 0;            yC[i] = i;            yAsm[i] = i;        }        time_t start = clock();        calcuC(xC,yC,2000);        //    calcuAsm(xAsm,yAsm,2000);        //    for(int i = 0; i < 2000; i++)        //    {        //        if(xC[i] != xAsm[i])        //        {        //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;        //            errorOccured = true;        //            break;        //        }        //    }        //    if(errorOccured)        //        cout<<"Error occurs!"<<endl;        //    else        //        cout<<"Works fine!"<<endl;        time_t end = clock();        //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"\n";        cout<<"time = "<<end - start<<endl;        return 0;    }Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.And here comes the result.Function of assembly version:-----------------    Debug   Release    ---------------    732        668    733        680    659        672    667        675    684        694    Average:   677Function of C++ version:------------    Debug     Release    -----------------    1068      168     999      166    1072      231    1002      166    1114      183    Average:  182The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

Sid M. · Accepted Answer

Hello!Your question touches on. several related subjects, not the least of which is the following:In DonaldKnuth's paper "StructuredProgrammingWithGoToStatements", he wrote: "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."The lesson implied that you shouldn't be concerned with replacing your C++ code with assembler, until you know (because you performed performance analysis) that (a) it's necessary, and (b) it will be effective.Now, with respect to your example, I can make several comments and/or suggestions.First, most compilers (including gcc) support generating the assembler output from the compiler for your examination. I recommend that you do so, because it it almost certain to show you just why your hand-written assembler isn't as fast as that generated by the compiler.Now, looking at the assembler code you wrote, I can find places that were correct (ie. they work), but weren't optimal (and result in unnecessary slowness).
void calcuAsm(int *x, int *y, int lengthOfArray) {
  __asm { mov edi,TIMES
  start: mov esi,0
    mov ecx,lengthOfArray
  label: mov edx,x
    push edx
    mov eax,DWORD PTR [edx + esi*4]
    mov edx,y
    mov ebx,DWORD PTR [edx + esi*4]
    add eax,ebx
    pop edx
    mov [edx + esi*4],eax
    inc esi
    loop label
    dec edi
    cmp edi,0
    jnz start
  };
}
Here are some suggestions:
In the inner loop, begun at `label` and ended with `loop label`, you are loading x and y every time. Move these loads outside of the loop, instead. You'll avoid the push/pop, too.
Rather than load (to registers eax and ebx) the values to be added, then storing the result, load 1 into a register, then. add that register to memory, directly. Avoids a load, a store, and an extra register.
The loop instruction, though convenient, is notoriously slower than the equivalent (decrement register, jump if not 0).
You can remove the `cmp edi,0` instruction, as the previous `dec edi` sets the flag for the following `jnz start`.

Is inline assembly language slower than native C++ code?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

Computer Programming

i dont know how to connect merge sort and direct insertion to a single c programme?

can anyone tell me why this has errors?

how do i do this c++ program????????

Starting out with C++ review question

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

Is inline assembly language slower than native C++ code?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

Computer Programming

i dont know how to connect merge sort and direct insertion to a single c programme?

can anyone tell me why this has errors?

how do i do this c++ program????????

Starting out with C++ review question

RECOMMENDED TUTORS

find an online tutor