Scaling your Data Platform using Scala

Rajesh Muppalla, Indix.

Scaling your Data Platform using Scala

Rajesh Muppalla - rajesh@indix.com

About Me

About Indix

Tech Stack @ Indix

Data Pipeline @ Indix

Scala @ Indix

Data Collection (Crawling)

Crawler - Requirements

Our Options

Our Choice - Akka

Why Akka?

So what`s the
secret sauce?

Actors

What is an Actor?

With a Diagram

Show me the code

Hello World

Parallelism

Supervision

Clustering

Our Setup

Patterns Used

Lessons Learned

Lessons Learned (Continued)

Data Processing

Requirements

Map Reduce

First Attempt - Java

Second Attempt - PIG

Third Attempt - Scalding

Why Scala?

Scalding Model

Where do we use Scalding?

Our Data Pipeline

Problems

Spark

What is Apache Spark?

Where do we use Spark?

Resources

Questions

Thanks

Extras

Fork me on Github