Presentation: Tweet"Building scalable Big Data pipelines"
We are witnessing a paradigm shift from batch based data processing to real-time data processing using the Hadoop framework. Despite this progress it is still a challenge to process web-scale data in real-time. A lot of technologies can be used to create such a complete data processing system - but to choose the right tools, to incorporate and orchestrate them is complex and daunting.
This talk shows how to apply technologies like Kafka, Storm, Hadoop and various NoSQL databases to build a scalable and robust system. It also describes the “Lambda Architecture”, a set of principles that defines how batch and stream processing can work together to solve real-time problems.
Download slides