Last year we oversaw a TSG* 11 Billon benchmarking exercise. This past week we talked with Alfresco arch-rival, Nuxeo, who has now also taken it to 11. On the one hand, we can see massive benchmarking activities as little more than techies showing off. On the other hand, and it will come as a surprise to some, today’s repositories of documents are sometimes counted in multiple billions. There are many good arguments as to why any organization should never amass and have to manage, and of course, pay for the storage and management of billions of files in the first place. Good information governance would likely cut many of these dark data mountains down to size. But, they exist, and they are becoming more common. The questions to ask are, why does anyone want to manage billions of files, and why would they want to migrate them?
The answer to the first question is simply that few want to manage billions of files, but those who do believe they have no choice but to do so. It’s in answering the second question, regarding migration that things start to get interesting. I have talked to multiple firms that store and manage such huge volumes, and the answers are always the same.
1: They want to consolidate multiple (typically legacy on-premises system) into one location to manage
2: They have a mandate to move everything to the cloud, whether it is a good idea or not
3: To reduce costs, though they seldom do so
4: To unlock the value of these dark data mountains
Actually, in truth, nobody has ever answered with number 4; I just put that in there as that would seem to be the best reason to consolidate massive amounts of data in the cloud.
I am not going to go into the details about the benchmark test themselves in this note, other to say they involved AWS, Elasticsearch, NoSQL databases, and the practice of sharding. You can hear about the TSG benchmark on a podcast here, and Nuxeo has a white paper that can be downloaded from their site. And for more insight into this topic as a whole, you can access the paper my colleague Apoorv, and I wrote for TSG called “A Big Data Approach to ECM.”
So what we know at this point is that multiple billion file single repositories are a niche luxury item, much like a Mont Blanc pen (I have two) that can’t be justified. Still, their owners will nevertheless continue to justify them. Secondly, at least two companies have proven that they can migrate and manage at this scale and that it is possible to effectively store, index, and search for documents at such high volumes. Perhaps, more importantly, thanks to both benchmarks’ openness, we now have reference architectures and guidance on how to tackle such massive projects. All joking aside, there is a growing need for Content Services/ECM at scale, and I do applaud Nuxeo and TSG for undertaking these onerous exercises, breaking new ground, and providing valuable insights and information to the community.
* At the time, TSG was an independent services firm in Chicago looking to benchmark a migration with DynamoDB and AWS. Since that time, Alfresco acquired TSG. Hyland then acquired Alfresco, so we will wait and see what the new owners do with this capability.
Get trusted advice and technology insights for your business from the experts at Deep Analysis. [email protected]