We’re getting into more and more discussions with our customers about Big Data: what it is, what it’s good for, and the problems it solves.
Big Data systems are made up of 6-7 classes of software that evolved to deal with the difficulty (and cost) of scaling traditional database systems like Oracle, MS SQL, MySQL, and PostgreSQL to deal with volume (trillions of rows), velocity (millions of additions per second), and variety (where sources of data don’t all have exactly the same data types).
Almost all of the systems cluster horizontally (by adding boxes to a pile), but they introduce a very big trade-off:
They almost all break the rules of data integrity (ACID) that business systems rely on (being obeyed by database backends). At times they also require a different query interface or way of thinking (like with MapReduce).
In many cases, we don’t recommend Big Data systems for customers. They’re mostly bleeding-edge technologies and require very generalist sysadmins and data/app users to work well (the few exceptions are some of the key-value stores like memcached, or specific use cases of MapR Hadoop distributions).
I’ll be writing more blog posts on Big Data systems out there today, the problems they’re designed to solve, ways to use them with applications, “gotchas” to consider, and questions to ask vendors. In the meantime, here’s a video recording of a webinar that covered highlights of the space and walks through of MapReduce, key-value store, document database, graph database, and the somewhat murky world of NewSQL, which promises, but doesn’t quite magically provide, Big Data while preserving the ACID properties that traditional applications require.
As usual, we’re happy to review your data back-ends and application requirements and talk through what we’ve seen or support that might be a good fit for you. Feel free to contact us to ask questions and explore your requirements.