Scaling Data Analysis with Spark

Learn to develop big data applications using core Spark APIs.


This 3-day workshop will introduce you to Apache Spark and core Spark APIs including Spark SQL and GraphX. This course is hands-on technical exercises to get both programmers and analysts up to speed in using Spark for data exploration, analysis, and building big data applications. At the end of Day 3, you will have working 4 node cluster on Amazon Web Services (AWS) that you can start and stop for any data analysis and/or data preparation tasks.



  • Course curriculum designed in collaboration with the industry.
  • Level: Basic to Intermediate.
  • Pricing: $1,450
  • Questions? Get in touch. Shoot us an or call (214)-997-6100


  • Python as the programming language.
  • Pandas for Data Wrangling.
  • Numpy (NumPy) for its array data structure and data manipulation functions.
  • matplotlib and seaborn for its graph plotting functions.
  • Jupyter notebook as your development and collaboration environment.
  • Microsoft Excel and Google Forms as data sources for analysis, Microsoft Excel for insights output


There are six modules in the Scaling Data Analysis with Spark workshop. Each module introduces one or two core Spark constructs while working through the practical implementation.

Day 1|—–  Spark Programming basics

Day 1|—– Deep dive into Resilient Distributed Datasets

Day 2|—– Deep dive into Spark SQL using Morningstar Financial Analysis

Day 2|—– Walk-through of end-to-end framework to build and deploy applications [Order / Product Analysis, Flight Analysis]

Day 3|—– Performance Tuning of Spark Applications

Day 3|—– Deploy Spark Cluster on Amazon Web Services

Day 3 |—– CLOSING (Wrap-up and successful participation certificates)




  • Do I need any programming experience to attend?
    Yes – you will need previous experience in Python language.
  • What will I be doing?
    In three day you will apply data analysis to large datasets on Spark clusters both locally and on Amazon Web Services clusters, understand various techniques to tune Spark applications.As part of this workshop, you will familiarize yourself with several best practices for developing end-to-end applications including integration with other big data resources such as Hive.
  • What age is this for?
    This workshop is suitable for university students and professionals.
  • Do I need a computer?
    It is best if you bring your own laptop.
  • What’s the experience like?
    The workshops are very hands-on and you will learn by doing. After the first 30 minutes of the workshop you will already be coding! Our instructors are always there to help you if you get stuck.


Call us at (214)-997-6100 if you have special circumstances or looking for dedicated corporate training