Practical Data Science & Big Data Analytics


This course is a practical hands-on jump start in Big Data and Data Science designed as a first step for trainees who are interested to start a career in this evolving fast growing specialty. The course starts with introducing the practice itself and ensures deep understanding of Big Data and Data Science technically by removing ambiguity of such new domain. Trainees will apply knowledge and skills to a course project.

The course then introduced the trainees to how Big Data and Data Science solutions are implemented in real-life by demonstrating the lifecycle as well as common tasks carried out during each phase. The data science methods and techniques is contrasted by use cases explaining the relation between statistical modelling, machine learning and data mining. More solid understanding of Data Driven Models is discussed to link those techniques by example.

Upon building solid understanding of Data Science Models, the discussion proceeds to implementing those models to process Big Data both structured and unstructured. Several Big Data technologies are introduced with uses cases, patterns and examples. The course concludes by discussing how to make the last steps to finalize, operationalize and present your solution including planning Agile implementation to roll out to production.


Who Should Attend?

  • Managers and leaders who would like to lead projects in this field
  • Data and business analysts
  • BI and data warehousing specialists
  • Data Engineers and software developers / architects
  • Data Applications Developers
  • Researchers / innovators / start-ups
  • Executives and Business Leaders
  • Regulators and Gov Organizations
  • Technology Services Providers
  • Implementation Consultants and ISV


Course Main Topics

Introduction: Data Science & Big Data Architecture, Practices and Use Cases

  • Big Data: Practical Technical Overview
  • Attributes of Big Data solutions
  • Digital Transformation and Internet of Things (IoT)
  • State of the Practice in Analytics
  • Big Data Analytics in Industry Verticals: Use Cases
  • Architectural Aspects of Big Data Solution


Project Introduction & Lab Practices

Lifecycle and Implementation in Practice

  • Introductions to Lean Agile Methods
  • Data Analytics Lifecycle
  • Discovery
  • Data Preparation
  • Model Planning
  • Model Building
  • Communicating Results
  • Operationalizing

Project Reflection & Lab Practices

Data Science Methods and Techniques (Statistical Modelling – Unsupervised – Supervised)

  • Introduction to Data Modelling
  • Data Science Tools: R/R Studio – Python – Octave – Other Tools
  • Exploring and Visualizing the Data
  • Statistical Modelling
  • K Means Clustering
  • Association Rules
  • Linear Regression
  • Logistic Regression
  • Naïve Bayesian Classifier
  • Decision Trees
  • Time Series Analysis
  • Text Analysis


Project Reflection & Lab Practices

Big Data Platforms and Technologies (Core Hadoop – Ecosystem – Advanced)

  • Challenges of Structured and Unstructured Big Data
  • The Massively Parallel Processing Concepts
  • NoSQL Database Solutions
  • Hadoop HDFS and YARN
  • Data Ingestion using Flume, Sqoop and Spring XD
  • Working with Pig, Hive and HBase
  • Spark, Mllib and Spark Streaming
  • Greenplum and MADLib
  • Deploying Data Science Analytics Models on Big Data Platforms


Project Reflection & Lab Practices

Consulting and Solution Delivery

  • Deeper view into Lean Agile: Scrum Formwork
  • Operationalizing Analytics Solutions
  • Creating Final Deliverables
  • Data Visualization and Presentations Techniques


Course Exam

E20-007 Data Science and Big Data Analytics Exam BY DELL



4 days