When you include delays in the networking stack to get data on and off the systems then such tiny latencies usually don't really matter.
And there is no way that a Parallella could compete with a tricked out PC (I got an 4GHz 8-core with 256SSD +32GB RAM for about $1000 wholesale), let alone server grade hardware (I've got blades at work that can take 24 32GB DIMMs... and you can get 1U SAN devices filled with TBs of flash if you have enough money).
I was more thinking along the lines of this....
Say you need to create a lookup catalogue for 100M book names, authors and ISBN numbers (maybe 200 bytes per book - for 20GB of data). If you were to try this in a database you would create a table with an index on it. When you feed a name it will navigate down through the index blocks, requiring a random read at each level, and then finally down to the data block with the name, author, ISBN and other stuff. To read a record might take 5 or 6 I/Os.
One other possibility would be to use an on-disk hash table. Hash the book name into a number between 0 and 32GB-1, then issue a single 128K read at the 1 byte prior to the hash value. If the records are stored as null terminated strings somewhere immediately on or as soon as possible after the hash value then you just start scanning sequentially in the block at that address for the name. When you see two null bytes in a row you know you are done (not found).
All the indexing work is done upfront, nearly any record can be found by book name with just one I/O (exceptions being for hashes where the reads wrap around the table, or an awfully common or lots of hash collisions (in which case you should move to a hash with a larger range, or a better hash)).
Even if you do fail to find it in the first block, it then becomes a streaming sequential read....Statistics: Posted by hamster — Sun Sep 08, 2013 10:00 pm
]]>