Subject: Re: expensive procedure calls Date: 9 Mar 2002 20:28:35 GMT From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Organization: Institut fuer Computersprachen, Technische Universitaet Wien Newsgroups: comp.os.vms,comp.arch,comp.sys.intel,alt.folklore.computers Followup-To: comp.arch In article <3C7DEE45.C1F8A4BA@hda.hydro.com>, Terje Mathisen writes: >I've done something almost unheard of in this regard: > >I've actually tried multiple different implementation methods, all >written in both C and asm to get full control, and then timed them, just >to find the fastest method. > >Yeah, I know that this is bad 'cause it can stop a nice discussion, but >I wanted to win a competition, so please excuse me. :-) ... >Anyway, the important part is that plain recursion can indeed be very >fast, in my case it was almost certainly due to keeping the code working >set really small, since the data was the same for all versions. Ok, in this spirit I coded up a number of variants of tree-walking, because that was the example discussed elsewhere in this thread, and measured their performance. You can find the source code below, here are some explanatory comments: I used node-counting instead of node-printing, because node-printing is very likely dominated by printing and not by tree-walking. I think I did preserve the order of visits (depth-first left-to-right) in all variants, though. The code does not build a full tree, but a minimal DAG with n nodes for simulating a tree with 2^n-1 nodes. The various variants are: 1) The straightforward node-counting function is not tail-recursive (it performs an add after the second call). 2) It can be converted into a tail-recursive function by rearranging the computation with an additional parameter; gcc (with the options I used) eliminates the tail recursion in this case. 3) Tail recursion eliminated by hand. 4) Complete recursion elimination. This uses a stack remembering the spine of right-child nodes; the stack is fixed-size, which would not work in general; a general version with a growable stack would probably be somewhat slower. 5) This is 4), optimized by checking the right child earlier (analogous optimizations should also speed up all other variants, at the expense of more complexity). I guess the perceptive readers will find some errors in the code, but at least the output in the benchmark case is correct, so I hope the errors don't affect the results. I used the following timing script: for i in 1 2 3 4 5; do gcc -O3 -fomit-frame-pointer -DOPT=$i tree-walk.c; echo "OPT=$i:"; perfex -e 4100c0 a.out 20; done The timings (cycle results) on my Athlon-1200 show quite a bit of variation, but most speed differences are larger than these variations (except the speed differences between 2 and 3); the output of 6 repetitions was massaged into the following table: run 1 run 2 run 3 run 4 run 5 run 6 OPT=1: nodes = 1048575 cycles : 30339154 30361572 30217474 30939957 30182402 30783846 instructions : 34938295 34938295 34938294 34938294 34938294 34938294 OPT=2: nodes = 1048575 cycles : 22274720 21604934 21070969 21106644 22821884 22516939 instructions : 24452545 24452544 24452545 24452545 24452545 24452545 OPT=3: nodes = 1048575 cycles : 20311913 19708046 21274829 19942656 22096462 20135104 instructions : 23403970 23403971 23403971 23403971 23403971 23403970 OPT=4: nodes = 1048575 cycles : 13830580 13087445 12958420 13356717 12746190 12471338 instructions : 14491094 14491095 14491093 14491093 14491095 14491093 OPT=5: nodes = 1048575 cycles : 10992514 11753863 10990741 11310144 11019757 11570805 instructions : 10821077 10821077 10821076 10821076 10821077 10821077 So, it seems like eliminating recursion still pays off. I guess the speed difference between 2 and 3 is caused by the additional parameter in the recursive calls in 2. Followups set to comp.arch. - anton And here's the code: #include #include typedef struct node { struct node *left, *right; } node; int count_nodes(node *n); int main(int argc, char **argv) { unsigned depth; int i; node nodes[40]; if (argc!=2) { fprintf(stderr, "Usage: %s depth\n", argv[0]); exit(1); } depth=atoi(argv[1]); /* build a dag of depth depth */ nodes[0].left = nodes[0].right = NULL; for (i=1; ileft) + countnodes(n->right) + 1; } #elif OPT==2 int countnodes2(node *n, int c) { if (n==NULL) return c; c += countnodes2(n->left,0)+1; return countnodes2(n->right,c); } int countnodes(node *n) { return countnodes2(n,0); } #elif OPT==3 int countnodes(node *n) { int c=0; for (; n!=NULL; n=n->right) c+=countnodes(n->left)+1; return c; } #elif OPT==4 int countnodes(node *n) { node *stack[40]; node **sp=stack; int c = 0; *sp++ = n; while (sp>stack) for (n = *--sp; n!=NULL; n=n->left) { *sp++ = n->right; c++; } return c; } #elif OPT==5 int countnodes(node *n) { node *stack[40]; node **sp=stack; int c = 0; if (n==NULL) return 0; *sp++ = n; while (sp>stack) for (n = *--sp; ;) { if (n->right) *sp++ = n->right; c++; n=n->left; if (n==NULL) break; } return c; } #endif - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html