The legacy vehicle history database is a distributed key value store on OpenVMS. Compressed the database is roughly nine terabytes. This system has served CARFAX very well since 1984 when the company was founded, but we were capping on the number of reports per second that could be delivered per production cluster. The hardware cost and technical difficulties of scaling the legacy system were very high. Thus we decided to start looking at different options.
As an Agile and extreme programming development shop, CARFAX uses Friday afternoons as professional/personal development time. Several developers began using this time to experimenting with traditional RDBMS and NoSQL solutions for the vehicle history database. The complexity of the data structures compounded with the size of the database did not lend itself to Oracle or MySql. We also evaluated Cassandra, CouchDB, Riak, and several other NoSQL solutions (I really wanted to use Neo4J to create a social network of cars). In the end we decided that MongoDB gave us the best of where we wanted to be in the CAP theorem and offered the level of commercial support we were looking for.
Cleansing, transforming, and inserting eleven billion complex records into a database is no small task. To avoid inconsistencies with the ongoing data feeds it was decided that we needed to load the existing records into MongoDB in under fifteen days. The legacy records were of multiple formats that required specific interpretations. Many of the records were also inserted using PL/I data structures that have no direct mapping in Java or C and required bitmapping and other fun tricks. Once we wrote the Java and C JNI algorithms to transform the old records to the MongoDB format we began our loads.
MongoDB does not support OpenVMS and as such we had to use remote MongoS processes to load the data. Using direct inserts to MongoDB we opened up the queues. The process was immediately computationally bound. The poor processers on the OpenVMS boxes were running full tilt; we could barely even get a command prompt to check on the system. We managed to throttle back the queues so we could get a stable session, but the projected extract, transform, and load time was at forty five days. We let the loads continue but we went back to the drawing board to consider how to solve the computational limitations we had hit.
The transformation and insertion logic was already pretty clean and stateless so with a modest amount work we were able to implement a multi-threaded
and multi-JVM RabbitMQ solution. We had some servers that were slated to become app-servers that were not being used yet so we deployed the RabbitMQ
servers and lots of consumer process onto them. We released the queues. The initial numbers did not look all that promising, but once the shard
ranges on MongoDB settled in we began to see very impressive load numbers. We averaged over eleven thousand plus writes a second to MongoDB.
We finished the load in just under ten days. Through judicious use of compression of certain fields that are not useful for normal queries or
aggregation the MongoDB collection was actually slightly smaller than the original database on OpenVMS.
Successes of the Migration
Jai Hirsch
Senior Systems Architect
CARFAX
jai.hirsch@gmail.com