Advanced Data Science Course Training In Hyderabad
INTRODUCTION TO DATA SCIENCE
a. What Is Data Science?
b. The Future for Data Scientists
c. What Is Big Data?
d. What Is Python?
e. What Is R?
f. Business Analytics Versus Data Science
g. Defining Analytics
h. Tools Available to Data Scientists
i. Guide to Data Science Cheat Sheets
j. Packages in Python for Data Science
k. Similarities and Differences between Python and R
l. Why Should R Users Learn More about Python?
m. Why Should Python Users Learn More about R?
l. Why Should R Users Learn More about Python?
m. Why Should Python Users Learn More about R?
n. Using R and Python Together
o. Using SAS with Jupiter
p. Using Python and R for Big Data Analytics
q. What Is Cloud Computing?
r. Using Python and R on the Cloud
s. Commonly Used Linux Commands for Data Scientists
t. Learning Git
u. Data-Driven Decision Making: A Note
v. Strategy Frameworks in Business Management:
w. A Refresher for Non-MBAs and MBAs
x. Who Have to Make Data-Driven Decisions?
PREPARING DATA SCIENCE ENVIRONMENT
a. Introduction
b. Understanding the data science pipeline
c. Installing R on Windows and Linux
d. Installing libraries in R and RStudio
e. Installing Python on Windows and Linux
f. Installing the Python data stack on Linux
g. Installing extra Python packages
R PROGRAMMING FOR DATA SCIENCE
a. R programming Basic Concepts
b. R Programming Data Structures
c. R Programming Control Flows and Functions
d. R Programming Functions
e. R Programming Matrices
f. R Programming Numerical Computation
g. R programming Statistical Data
h. R Programming Basic Graphs
i. R Programming Graphics
j. R Programming Object-oriented Programming
k. R Programming Installing and Creating Packages
PYTHON FOR DATA SCIENCE
a. Introduction to Python
b. Installation of Python framework and Packages: Anaconda
c. Introduction to Jupyper and Spider
d. Variables and Data Types in Python
e. Python Operators and Expressions
f. Lists and Tuples
g. Range, Sets and Dictionaries
h. Control Structures and Functions
i. Classes and Object-oriented programming
j. Errors and Exception Handling
k. Modules and Packages
l. Generating and Manipulating Arrays with Numpy
m. Handling Data with Pandas
n. Plotting Data using Matplotlib and Seaborn
BASIC LINEAR ALGEBRA
Introduction
Vectors
a. Transpose of vectors
b. Mathematical operations on vectors
c. (Inner) product of vectors
d. The length (norm) of a vector
e. The 0–vector and 1–vector
f. Orthogonal (perpendicular) vectors
Matrices
a. Matrices
b. Multiplying a matrix with a number
c. Transpose of matrices
d. Matrix addition and multiplication
e. Some special matrices
f. Inverse of matrices
g. Solving systems of linear equations
h. Trace
i. Determinant
Least squares
STATISTICS AND PROBABILITY
Basic Statistics- Data Description
a. Introduction to Statistics
b. Data and Data Types
c. Quantitative Data: Discrete and Continuous data
d. Qualitative Data (categorical, nominal and ordinal data)
Descriptive Statistics
a. Parameters and Statistics
b. Samples and Population
c. Frequency Distribution
d. Measures of Central Tendency or Location (includes Mean, Mediann Quartiles and Mode)
e. Measures of Variability (includes Range, Quartile Deviations, IQR, Variance and Standard Deviation)
f. Measures of Shape (includes skewness and kurtosis)
g. Bivariate Statistics
i. Cross-tabulations and contingency tables
ii. Graphical representation via scatterplots
iii. Quantitative measures of dependence (Covariance and Pearson’s r, Spearman’s rho and Kendall’s Rank Correlation )
Probability
a. Sample Space and Events
b. Counting Principles
c. Probability and Axioms of Probability
d. Properties of Probability
e. Conditional Probability and its Properties
f. Independent Event and Marginal Probability
g. Baye’s Theorem
Random Variables and Probability Distributions
h. Discrete and Continuous Random Variables
i. Univariate Probability Distributions
a. Discrete Uniform Distribution
b. Bernoulli and Binomial Distribution
c. Poisson Process and Poisson Distribution
d. Geometric Distribution
e. Hypergeometric Distribution
f. Negative Binomial Distribution
g. Uniform Distribution (Continuous)
h. Exponential Distribution
i. Normal Distribution and Standard Normal Distribution
Multivariate Probability Distribution
j. Joint and Marginal Distribution
k. Conditional Probability Independence Inferential Statistics
a. Sampling and Sampling Distribution
a. SRSWR and SRSWOR
b. Systematic Sampling
c. Cluster Sampling
d. Parameters of Finite and Infinite Populations
e. Chebyshev Inequalities and Central Limit Theorem
f. Sampling Distribution of the Sample Mean
g. Sampling Distribution for the Sample Proportion
h. Sampling Distribution for the Sample Variance
i. Sampling Distributions Associated with the Normal Distribution
i. Chi-Square Distribution (χ2)
ii. t-Distribution.
iii. The F Distribution
b. Estimation Theory
a. Point Estimators and their Properties
b. Point Estimations Techniques
i. Method of Moments
ii. Maximum Likelihood Estimators (MLEs)
c. Interval Estimations and Confidence Intervals
i. Confidence Intervals for Population Means
ii. Confidence Intervals for Population Variances
iii. Confidence Intervals for Population Proportion.
c. Testing of Hypothesis
a. Type I and Type II Errors
b. Power Function
c. Uniformly Most Powerful Test
d. ℘-Value or Critical Level
e. Tests of Significance
f. Hypothesis Tests for Population Means .
g. Hypothesis Tests for Population Variances .
h. Hypothesis Tests for Population Proportions .
d. Nonparametric tests
a. Sign Test
b. Wilcoxon Signed-Rank Test
c. Mann-Whitney U-Test
d. Kruskal-Wallis Test
e. Goodness-of-Fit Tests .
f. Categorical Data Analysis
g. Nonparametric Bootstrapping.
h. Permutation Tests
e. Experimental Designs
a. ANOVA for One-Way Fixed Effects Model
b. Power and the Non-Central F Distribution.
c. Multiple Comparisons of Means
d. Random Effects Models
DATA WRANGING
• Introduction to Python packages - Pandas and Numpy
• Introduction to R packages - dplyr and tidyr
• Importing and exporting Data
- Reading and writing Data in Text and Excel formats
- Importing CSV, JSON and XML Data
- Working with Delimited Formats
• Web Scraping: Acquiring and Storing Data from the Web (using R rvest and Python beautifulsoap)
o Scraping HTML text and table data
o Reading and analyzing a web page
• Using web APIs
o API Features
o A Simple Data Pull from Twitter’s REST API
• Data Exploration and Visualization
o Data Cleaning and Preparation
- Handling missing data with R Tidyverse and Python pandas
Normalizing and Standardizing the Data
Reshaping Data with R tidyrand Python pandas
Transforming Data with R dplyrand python pandas.
Finding Outliers
o Dimensionality Reduction
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
o Data Visualization with R ggplot2 and Python ggplot and seaborn
Histogram
Box Plot
Pie-Chat
Scatter Plot
Bar Chart
Area chart
Heat Map
Correlogram
MACHINE LEARNING MODELS
MACHINE LEARNING MODELING METHODS
• Choosing and evaluating models
o Mapping problems to machine learning tasks
o Solving classification problems
o Solving scoring problems
o Working without known targets
o Problem-to-method mapping
• Evaluating models
o Evaluating classification models
o Evaluating scoring models
o Evaluating probability models
o Evaluating ranking models
o Evaluating clustering models
• Validating models
o Common model problems
Overfitting and Underfitting
Correctness
The Bias-Variance Trade-off
o Ensuring Model Quality
SUPERVISED LEARNING MODELS
• Building single-variable models
• Building models using many variables
o Using decision trees
o Using nearest neighbor methods (KNN)
o Using Naive Bayes
LINEAR AND LOGISTIC REGRESSION
o Using linear regression
Building a linear regression model
Making predictions
Finding relations and extracting advice
o Using logistic regression
Building a logistic regression model
Making predictions
Finding relations and extracting advice from logistic models
MULTIPLE REGRESSIONS
• The Model
• Further Assumptions of the Least Squares Model
• Fitting the Model
• Interpreting the Model
• Goodness of Fit
• Digression: The Bootstrap
• Standard Errors of Regression Coefficients
UNSUPERVISED LEARNING MODELS
• Cluster analysis
o Distances
o Preparing the data
o Hierarchical clustering
o The k-means algorithm
• Association rules
o Mining association rules
o THE APRIORI() algorithm
o The F-P Growth Tree Algorithm
EXPLORING ADVANCED METHODS
• Using bagging and random foreststo reduce training variance
Using bagging to improve prediction
Using random forests to further improve prediction
• Using generalized additive models (GAMs)
• Using kernel methods to increase data separation
Using SVMs to model complicated decision boundaries
DEEP LEARNING USING TENSORFLOW
• Introduction to deep learning and neural networks
• Perceptrons
• Feed-Forward Neural Networks
• Backpropagation
• Understanding neural networks through TensorFlow
• Convolution and Recurrent Neural networks
DATA VISUALIZATION USING MATPLOTLIB AND TABLEAU
• Interactive visualization with Matplotlib
• Tableau dashboard and storyboard
• Data visualization using Tableau
• Tableau integration with R and Python
DATABASES AND SQL
• Create table and insert
• Update
• Delete
• Select
• Group by
• Order by
• Join
• Subqueries
• Indexes
• Query Optimization
• NoSQL
HANDLING BIG DATA WITH SPARK
• Introduction to Big Data and Spark
• Map reduce
• Spark Streaming
• GraphX
• RDD’s in Spark
• Spark SQL
• Mlib