Themis

I/O-Efficient MapReduce

About

Themis is a MapReduce implementation architected for a high degree of per-node efficiency. It achieves this high performance by reading and writing each record to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in main memory.

Themis and its predecessor system, TritonSort, set ten large-scale sorting records. Five of these are current world records as of 2015:

  • 2014 Daytona GraySort - 4.35 TB/min
  • 2014 Indy CloudSort - $451
  • 2014 Daytona CloudSort - $451
  • 2011 Indy 100TB JouleSort - 103 MJoules
  • 2011 Daytona 100TB JouleSort - 132 MJoules

Previous world records:

  • 2011 Indy GraySort - 938 GB/min
  • 2011 Daytona GraySort - 725 GB/min
  • 2011 Indy MinuteSort - 1353 GB
  • 2010 Indy GraySort - 582 GB/min
  • 2010 Indy MinuteSort - 1014 GB