Advanced Data Science Course Training In Hyderabad

INTRODUCTION TO DATA SCIENCE

a. What Is Data Science?

b. The Future for Data Scientists

c. What Is Big Data?

d. What Is Python?

e. What Is R?

f. Business Analytics Versus Data Science

g. Defining Analytics

h. Tools Available to Data Scientists

i. Guide to Data Science Cheat Sheets

j. Packages in Python for Data Science

k. Similarities and Differences between Python and R

l. Why Should R Users Learn More about Python?

m. Why Should Python Users Learn More about R?

l. Why Should R Users Learn More about Python?

m. Why Should Python Users Learn More about R?

n. Using R and Python Together

o. Using SAS with Jupiter

p. Using Python and R for Big Data Analytics

q. What Is Cloud Computing?

r. Using Python and R on the Cloud

s. Commonly Used Linux Commands for Data Scientists

t. Learning Git

u. Data-Driven Decision Making: A Note

v. Strategy Frameworks in Business Management:

w. A Refresher for Non-MBAs and MBAs

x. Who Have to Make Data-Driven Decisions?

PREPARING DATA SCIENCE ENVIRONMENT

a. Introduction

b. Understanding the data science pipeline

c. Installing R on Windows and Linux

d. Installing libraries in R and RStudio

e. Installing Python on Windows and Linux

f. Installing the Python data stack on Linux

g. Installing extra Python packages

R PROGRAMMING FOR DATA SCIENCE

a. R programming Basic Concepts

b. R Programming Data Structures

c. R Programming Control Flows and Functions

d. R Programming Functions

e. R Programming Matrices

f. R Programming Numerical Computation

g. R programming Statistical Data

h. R Programming Basic Graphs

i. R Programming Graphics

j. R Programming Object-oriented Programming

k. R Programming Installing and Creating Packages

PYTHON FOR DATA SCIENCE

a. Introduction to Python

b. Installation of Python framework and Packages: Anaconda

c. Introduction to Jupyper and Spider

d. Variables and Data Types in Python

e. Python Operators and Expressions

f. Lists and Tuples

g. Range, Sets and Dictionaries

h. Control Structures and Functions

i. Classes and Object-oriented programming

j. Errors and Exception Handling

k. Modules and Packages

l. Generating and Manipulating Arrays with Numpy

m. Handling Data with Pandas

n. Plotting Data using Matplotlib and Seaborn

BASIC LINEAR ALGEBRA

Introduction

Vectors

a. Transpose of vectors

b. Mathematical operations on vectors

c. (Inner) product of vectors

d. The length (norm) of a vector

e. The 0–vector and 1–vector

f. Orthogonal (perpendicular) vectors

Matrices

a. Matrices

b. Multiplying a matrix with a number

c. Transpose of matrices

d. Matrix addition and multiplication

e. Some special matrices

f. Inverse of matrices

g. Solving systems of linear equations

h. Trace

i. Determinant

Least squares

STATISTICS AND PROBABILITY

Basic Statistics- Data Description

a. Introduction to Statistics

b. Data and Data Types

c. Quantitative Data: Discrete and Continuous data

d. Qualitative Data (categorical, nominal and ordinal data)

Descriptive Statistics

a. Parameters and Statistics

b. Samples and Population

c. Frequency Distribution

d. Measures of Central Tendency or Location (includes Mean, Mediann Quartiles and Mode)

e. Measures of Variability (includes Range, Quartile Deviations, IQR, Variance and Standard Deviation)

f. Measures of Shape (includes skewness and kurtosis)

g. Bivariate Statistics

i. Cross-tabulations and contingency tables

ii. Graphical representation via scatterplots

iii. Quantitative measures of dependence (Covariance and Pearson’s r, Spearman’s rho and Kendall’s Rank Correlation )

Probability

a. Sample Space and Events

b. Counting Principles

c. Probability and Axioms of Probability

d. Properties of Probability

e. Conditional Probability and its Properties

f. Independent Event and Marginal Probability

g. Baye’s Theorem

Random Variables and Probability Distributions

h. Discrete and Continuous Random Variables

i. Univariate Probability Distributions

a. Discrete Uniform Distribution

b. Bernoulli and Binomial Distribution

c. Poisson Process and Poisson Distribution

d. Geometric Distribution

e. Hypergeometric Distribution

f. Negative Binomial Distribution

g. Uniform Distribution (Continuous)

h. Exponential Distribution

i. Normal Distribution and Standard Normal Distribution

Multivariate Probability Distribution

j. Joint and Marginal Distribution

k. Conditional Probability Independence Inferential Statistics

a. Sampling and Sampling Distribution

a. SRSWR and SRSWOR

b. Systematic Sampling

c. Cluster Sampling

d. Parameters of Finite and Infinite Populations

e. Chebyshev Inequalities and Central Limit Theorem

f. Sampling Distribution of the Sample Mean

g. Sampling Distribution for the Sample Proportion

h. Sampling Distribution for the Sample Variance

i. Sampling Distributions Associated with the Normal Distribution

i. Chi-Square Distribution (χ2)

ii. t-Distribution.

iii. The F Distribution

b. Estimation Theory

a. Point Estimators and their Properties

b. Point Estimations Techniques

i. Method of Moments

ii. Maximum Likelihood Estimators (MLEs)

c. Interval Estimations and Confidence Intervals

i. Confidence Intervals for Population Means

ii. Confidence Intervals for Population Variances

iii. Confidence Intervals for Population Proportion.

c. Testing of Hypothesis

a. Type I and Type II Errors

b. Power Function

c. Uniformly Most Powerful Test

d. ℘-Value or Critical Level

e. Tests of Significance

f. Hypothesis Tests for Population Means .

g. Hypothesis Tests for Population Variances .

h. Hypothesis Tests for Population Proportions .

d. Nonparametric tests

a. Sign Test

b. Wilcoxon Signed-Rank Test

c. Mann-Whitney U-Test

d. Kruskal-Wallis Test

e. Goodness-of-Fit Tests .

f. Categorical Data Analysis

g. Nonparametric Bootstrapping.

h. Permutation Tests

e. Experimental Designs

a. ANOVA for One-Way Fixed Effects Model

b. Power and the Non-Central F Distribution.

c. Multiple Comparisons of Means

d. Random Effects Models

DATA WRANGING

• Introduction to Python packages - Pandas and Numpy

• Introduction to R packages - dplyr and tidyr

• Importing and exporting Data

- Reading and writing Data in Text and Excel formats

- Importing CSV, JSON and XML Data

- Working with Delimited Formats

• Web Scraping: Acquiring and Storing Data from the Web (using R rvest and Python beautifulsoap)

o Scraping HTML text and table data

o Reading and analyzing a web page

• Using web APIs

o API Features

o A Simple Data Pull from Twitter’s REST API

• Data Exploration and Visualization

o Data Cleaning and Preparation

- Handling missing data with R Tidyverse and Python pandas

 Normalizing and Standardizing the Data

 Reshaping Data with R tidyrand Python pandas

 Transforming Data with R dplyrand python pandas.

 Finding Outliers

o Dimensionality Reduction

 Principal Component Analysis (PCA)

 Linear Discriminant Analysis (LDA)

o Data Visualization with R ggplot2 and Python ggplot and seaborn

 Histogram

 Box Plot

 Pie-Chat

 Scatter Plot

 Bar Chart

 Area chart

 Heat Map

 Correlogram

MACHINE LEARNING MODELS

MACHINE LEARNING MODELING METHODS

• Choosing and evaluating models

o Mapping problems to machine learning tasks

o Solving classification problems

o Solving scoring problems

o Working without known targets

o Problem-to-method mapping

• Evaluating models

o Evaluating classification models

o Evaluating scoring models

o Evaluating probability models

o Evaluating ranking models

o Evaluating clustering models

• Validating models

o Common model problems

 Overfitting and Underfitting

 Correctness

 The Bias-Variance Trade-off

o Ensuring Model Quality

SUPERVISED LEARNING MODELS

• Building single-variable models

• Building models using many variables

o Using decision trees

o Using nearest neighbor methods (KNN)

o Using Naive Bayes

LINEAR AND LOGISTIC REGRESSION

o Using linear regression

 Building a linear regression model

 Making predictions

 Finding relations and extracting advice

o Using logistic regression

 Building a logistic regression model

 Making predictions

 Finding relations and extracting advice from logistic models

MULTIPLE REGRESSIONS

• The Model

• Further Assumptions of the Least Squares Model

• Fitting the Model

• Interpreting the Model

• Goodness of Fit

• Digression: The Bootstrap

• Standard Errors of Regression Coefficients

UNSUPERVISED LEARNING MODELS

• Cluster analysis

o Distances

o Preparing the data

o Hierarchical clustering

o The k-means algorithm

• Association rules

o Mining association rules

o THE APRIORI() algorithm

o The F-P Growth Tree Algorithm

EXPLORING ADVANCED METHODS

• Using bagging and random foreststo reduce training variance

 Using bagging to improve prediction

 Using random forests to further improve prediction

• Using generalized additive models (GAMs)

• Using kernel methods to increase data separation

Using SVMs to model complicated decision boundaries

DEEP LEARNING USING TENSORFLOW

• Introduction to deep learning and neural networks

• Perceptrons

• Feed-Forward Neural Networks

• Backpropagation

• Understanding neural networks through TensorFlow

• Convolution and Recurrent Neural networks

DATA VISUALIZATION USING MATPLOTLIB AND TABLEAU

• Interactive visualization with Matplotlib

• Tableau dashboard and storyboard

• Data visualization using Tableau

• Tableau integration with R and Python

DATABASES AND SQL

• Create table and insert

• Update

• Delete

• Select

• Group by

• Order by

• Join

• Subqueries

• Indexes

• Query Optimization

• NoSQL

HANDLING BIG DATA WITH SPARK

• Introduction to Big Data and Spark

• Map reduce

• Spark Streaming

• GraphX

• RDD’s in Spark

• Spark SQL

• Mlib