Unit I Introduction to Data Science and Big Data Basics and need of Data Science and Big Data, Applications of Data Science, Data explosion, 5 V’s of Big Data, Relationship between Data Science and Information Science, Business intelligence versus Data Science, Data Science Life Cycle, Data : Data Types, Data Collection. Need of Data wrangling, Methods : Data Cleaning, Data Integration, Data Reduction, Data Transformation, Data Discretization. (Chapter - 1) Unit II Statistical Inference Need of statistics in Data Science and Big Data Analytics, Measures of Central Tendency : Mean, Median, Mode, Mid-range. Measures of Dispersion : Range, Variance, Mean Deviation, Standard Deviation. Bayes theorem, Basics and need of hypothesis and hypothesis testing, Pearson Correlation, Sample Hypothesis testing, Chi-Square Tests, t-test. (Chapter - 2) Unit III Big Data Analytics Life Cycle Introduction to Big Data, sources of Big Data, Data Analytic Lifecycle : Introduction, Phase 1 : Discovery, Phase 2 : Data Preparation, Phase 3 : Model Planning, Phase 4 : Model Building, Phase 5 : Communication results, Phase 6 : Operationalize. (Chapter - 3) Unit IV Predictive Big Data Analytics with Python Introduction, Essential Python Libraries, Basic examples. Data Preprocessing : Removing Duplicates, Transformation of Data using function or mapping, replacing values, Handling Missing Data. Analytics Types : Predictive, Descriptive and Prescriptive. Association Rules : Apriori Algorithm, FP growth. Regression : Linear Regression, Logistic Regression. Classification : Naïve Bayes, Decision Trees. Introduction to Scikit-learn, Installations, Dataset, mat plotlib, filling missing values, Regression and Classification using Scikit-learn. (Chapter - 4) Unit V Big Data Analytics and Model Evaluation Clustering Algorithms : K-Means, Hierarchical Clustering, Time-series analysis. Introduction to Text Analysis : Text-preprocessing, Bag of words, TF-IDF and topics. Need and Introduction to social network analysis, Introduction to business analysis. Model Evaluation and Selection : Metrics for Evaluating Classifier Performance, Holdout Method and Random Sub sampling, Parameter Tuning and Optimization, Result Interpretation, Clustering and Time-series analysis using Scikit-learn, sklearn. metrics, Confusion matrix, AUC-ROC Curves, Elbow plot. (Chapter - 5) Unit VI Data Visualization and Hadoop Introduction to Data Visualization, Challenges to Big data visualization, Types of data visualization, Data Visualization Techniques, Visualizing Big Data, Tools used in Data Visualization, Hadoop ecosystem, Map Reduce, Pig, Hive, Analytical techniques used in Big data visualization. Data Visualization using Python : Line plot, Scatter plot, Histogram, Density plot, Box- plot. (Chapter - 6)