I/O-Efficient MapReduce


Themis is a MapReduce implementation architected for a high degree of per-node efficiency. It achieves this high performance by reading and writing each record to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in main memory.

Themis and its predecessor system, TritonSort, hold four large-scale sorting records as of 2012:

  • Indy GraySort: TritonSort
  • Indy 100GB JouleSort: TritonSort
  • Daytona GraySort: Themis
  • Daytona 100GB JouleSort: Themis