NoSQL Search Roadshow Zurich 2013

Speaker Interview - Kai Wähner

We have asked Kai Wähner, Principal Consultant at Talend, to share some of his thoughts on the conference topic with us.

Kai Wahner works as Principal Consultant at Talend. His main area of expertise lies within the fields of Java EE, SOA, Cloud Computing, BPM, Big Data, and Enterprise Architecture Management. He is speaker at international IT conferences, writes articles for professional journals, and shares his experiences with new technologies on his blog.

    1. What are the key challenges that should make a company look at NoSQL solutions as an answer to these challenges?

Data increases a lot year by year, not just in volume, but also in variety and velocity. This structured and unstructured data has to be stored in different data stores as RDBMS cannot solve all problems anymore. Therefore, different concepts and data stores have been established, e.g. column-oriented data stores, graph databases, key-value stores or distributed file systems.

Storing this different kinds and sizes of data is not the only challenge. Afterwards, this data has to be integrated, processed and analyzed to get business value out of it. Without good frameworks and tooling, this is a lot of effort and error-prone.

Companies have to choose right data stores for they different kinds of data, and they have to choose tools to integrate, process and analyze this data easily.

    2. Moving from RDBMS to NoSQL requires looking at the company's data afresh. What major challenges do developers and architects encounter in this migration and which recommendations can you give to them in this process?

First recommendation: Only move data from RDBMS to NoSQL if it is really necessary. Do not look for new concepts and NoSQL database because they are cool and modern. Often, if RDBMS worked in the past, it also will work in the future. However, if you start a new project, a NoSQL database may be the right selection for parts of your project from the beginning.

Second recommendation: Learn and understand concepts of new databases before starting your migration. Each NoSQL concept has different strengths and weaknesses. Afterwards, think about how to map from RDBMS world to NoSQL world. E.g. if you migrate "graph data", be sure to know how you will map your data rows, foreign key, etc. to nodes, properties and edges of a graph database, before starting the implementation.

Third recommendation: Use good, easy tooling which reduces efforts a lot. Do not write integration glue code by yourself. There are awesome integration frameworks and Enterprise Service Bus alternatives on the market, even open source.

    3. NoSQL has been gaining considerable traction the last 5 years based on a large degree of specialization of the various NoSQL solutions. Which competitive factors and technical developments will be in play for the coming 5 years?

Great functionality, a large community and commercial support were most important factors in the past, and will be most important factors in the future. Open source code eases adoption and therefore improves functionality and increases community.

    4. Among NoSQL experts there is a talk about polyglot persistence (a mix of databases to handle various use-cases). Which benefits and costs does polyglot persistence bring with it and how does a company with a business to operate integrate polyglot persistence to maximize the benefits?

As said before, only use NoSQL databases when you need them. RDBMS is adequate for 90+ percent of all projects today. However, "big data" is coming and several use cases need other databases due to its volume, variety and / or velocity of data. The benefits come with the use cases – as you only should use NoSQL when you really need it. The costs increase short-term, because developers, administrators, etc. have to learn new concepts and best practices. Long-term, this will reduce costs and increase business value for companies.

Most companies will not have a choice if they want to be successful in the future. They have to use new concepts and frameworks to integrate, process and analyze big data.

    5. Is NoSQL the future of data storage?

Yes. But it is only a part of the future. RDBMS will be as important, because for many use cases, it is still the best and easiest alternative. Today, RDBMS is sufficient for 90+ percent of all projects. In 10 years, this number may be reduced to 50 percent or whatever. Nevertheless, RDBMS will not die because it is awesome for many use cases.

Companies have to combine RDBMS and NoSQL databases in their projects. Data from these projects has to be combined and used together, too. Integration, processing and analysis of ALL (combined) data is the real power and creates business value. So, companies should look for tools which support both RDBMS and NoSQL in the same way.

    6.What comes after NoSQL? There has been some talk about new concepts such as NewSQL what is your take on the next database paradigm?

Let's first talk about NewSQL: NoSQL databases were created because RDBMS could not solve some problems. However, RDBMS vendors are not sleeping. They improve their products, too. Therefore, NewSQL is nothing new, but "just" improved RDBMS with more features. So, companies can use RDBMS (with same tooling and knowledge as before) to realize use cases which were not possible to solve before without a NoSQL database. In the end, NoSQL is not needed everywhere where you think, but in many use cases, RDBMS is simply the wrong concept. So, RDBMS (including NewSQL) and NoSQL, both, have a great future.

What about "next database paradigm": RDMBS and NoSQL (i.e. Not Only SQL) already covers everything, there is no need for anything else. ☺ What's more important is improvements and better stability regarding distributed processing, failover, performance, etc. Vendors and open source communities are working hard in their projects.

However, to really answer the question regarding "next database paradigm", I think that in-memory databases and solid state disks (SSD) will become much more important in the next years. Memory gets cheaper every year. Vendors already have great products, e.g. SAP Hana is really awesome. Also, open source in-memory frameworks such as Apache Hazelcast are getting used more and more. SSDs improve I/O performance. Today, many databases and frameworks such as the most important framework for processing big data, i.e. Apache Hadoop, are not ready for SSDs in general, yet. However, companies are working on this issue (where it makes sense), e.g. Intel already offers a Hadoop distribution with SSD support, and a JIRA ticket is already in process for general support in Apache Hadoop.