DURATION: 12-weeks (50 hours/week)
WEEK 1 | MODULE ONE
DATA SCIENCE FOUNDATIONS, DATA WRANGLING AND EXPLORATORY DATA ANALYSIS
Students will learn to setup the process of Data science through:
- Cleanup of datasets using Python language and Pandas library
- Exploratory data analysis to generate hypotheses and intuition
- Communication of results through visualization, stories, and summaries
Read more about Week 1
- Version control – Fork repository, push & pull code
- Pair programming and Test Driven Development
- Data analysis – types of statistics and analytical methods and their relationship
- Where and how to acquire data, methods for evaluating source data, and data transformation and preparation
- Use Python’s Requests package to obtain data from web pages
- Use Python’s Beautiful Soup to parse the content of a web page to find useful data for subsequent analysis
- Python, Pandas, GitHub, UNIX Bash scripts, SQL
- Optional – coverage of contemporary Web scraping and Data wrangling tools.
In thefirst week, students work in small groups using Amazon Reviews dataset to apply the Exploratory Data Analysis, Data Wrangling and basic Feature Engineering concepts to answer a few sentiment analysis questions from the product review data for a product category of student’s choice.
WEEK 2 | MODULE TWO
STATISTICAL MODELING FOR INFERENCE
Students will learn to draw conclusions based on data. Upon completion of this module, students will be able to describe:
- Approaches to performing inference, and acceptance of results
- Concepts in causal inference and motivate the need for experiments
- Statistical tools to help plan experiments: exploratory analysis, power calculations, and the use of simulation
- Statistical methods to estimate causal quantities of interest and construct appropriate confidence intervals
- Scalable methods suitable for “big data”, including working with weighted data and clustered bootstrapping
Students will also be able to:
- Design, plan, implement, and analyze online experiments using contemporary tools
- Implementation of basic “A/B tests”, within-subjects designs and sophisticated experiments
- Make and interpret predictions from a Bayesian perspective.
- Understand the Explore-Exploit strategies related to Multi-armed Bandits
Read more about Week 2
- Contexts in which inference is desirable
- Modeling for Inference vs Modeling for Prediction
- Key statistics concepts – Distributions, Sampling, Confidence Intervals, Hypothesis Testing
- Statistical model selection
- Applied Probability for Statistical Inference
- Understand the cycle: model, apply, predict, setup experiments and observe
- Python packages – NumPy, SciPy, PyMC
- Optional – coverage of contemporary A/B Testing tools.
Multi-armed bandit approach to Internet display advertising to maximize sales; or find the best treatment out of many possible treatments while minimizing losses.