Performance Considerations
We run an
online profiler,
where several interesting calls are measured.
It might be interesting to see that this is run
on a live system on a pretty old hardware.
Most performance relevant issues are encapsulated in the
underlying language implementation.
Page Status
These notes are really outdated, last modification 21st Oct 2003!
TODO: there should be a related link section:
http://www.kegel.com/c10k.html
Very Old Notes
The rest of the document is pretty old.
It's only of historical interest and maybe good for
comparision of different implementation strategies.
Within Askemos it's more important to optimize for human
understanding than premature optimization for speed or memory
consumtion. Therefore many things are implemented "straight forward"
and there is space for optimization.
All algorithm, which operate on variable size data are (supposed
to be) of lowest known complexity (currently this means solved in
linear time). This means: optimization efforts can be traded for
hardware especially since the software is coded functionality and
highly threaded anyway.
An issue exists with the implementation of node-list. TODO It
should be done in a lazy way (see rdp.scm). I guess this could be
faster if applications fon't have to traverse the whole document.
There are some data structures, of size with an upper bound (i.e, the
slots of a place). Some operations currently use quadratic algorithm
(assoc) on them. Accepted because of small size.
Please don't try to optimize for space or speed on the expense of
human understanding. By coding style rules (to be written down) side
effects must be commented, and no algorithm must ever trade algorithm
complexity for either size or speed.
To be taken into account
About the
Google file system
(which shares a few LLD decisions, though not the master)
decouples data and control flow much to the performance advantage
of the whole system.
While we do that at the storage level (in the FileSystemMirror)
it should also be done upon request replication.
Observations
From the kernel-profiler
we found that a distributed transaction
over SSL (WAN? connection) takes approximately 0.15 seconds on average.
As expected this time is mostly dictated by the
byzantine protocol
and almost independant of the data size.
Comparing with
http://www.ovmj.org/GNUnet/smtp.php3?xlang=English
it seems that those 50ms negotiation time are what we would
expect from a tcp connection.
Considering that it takes two messages per peer
and the test envorionment has a typical ping time of 30-40ms,
we can't expect further improvements.
Actual Tests
Here a few rather simple and superficial tests,
which have been conducted.
Test setup: a laptop with Pentium 2 300 mhz
memory loaded; test date: 2003/01/31
Chicken lacks the -unsafe compile option,
because it happens to break the executable.
Moreover there seems to be some unintended constant delay (bug)
and some thread switch problem
in the chicken version, which might confuse the results here.
The comparision of static data delivery rate to the apache web
server is somewhat incorrect because Askemos doesn't have the concept
of static content.
| Test | rscheme | chicken | apache |
|---|
| 500x 6716 byte user home page | 40s | 720s | n/a |
| 500x 32250 byte,
two places involved,
two xslt transformations
| 64s | 750s | n/a |
| 500x 345k "static" data, no authorization | 18s | n/a (new bug) | 18s |
The 2nd test involves much more operations (place read) than the
1st, yet delivered 252k per second while the 1st gave 84k.
This
indicates that the xml serializing, which used to be the bottleneck
before, is of minor impact since some optimizations where introduced.
A fair comparison seems to be
http://jakarta.apache.org/velocity/anakia.html
(the rendering engine on Turbine)
who published their measurement of 23 pages in 7-8 seconds at a Piii 500mhz.
Not fair at all: apache (without cocoon) delivered the 28k static result page from third test
at a rate of 59 per second.