Please accept cookies to help us improve this website Is this OK? Yes No More on cookies »
Item number: 117950801

Data Science with Python Masterclass Training

Item number: 117950801

Data Science with Python Masterclass Training

999,00 1.208,79 Incl. tax

Award winning Data Science with Python Masterclass Training with access to an online mentor via chat or email, final exam assessment and Practice Labs.

Read more
Brand:
Python
Discounts:
  • Buy 2 for €979,02 each and save 2%
  • Buy 3 for €969,03 each and save 3%
  • Buy 4 for €959,04 each and save 4%
  • Buy 5 for €949,05 each and save 5%
  • Buy 10 for €899,10 each and save 10%
  • Buy 25 for €849,15 each and save 15%
  • Buy 50 for €799,20 each and save 20%
Availability:
In stock
Delivery time:
Ordered before 5 p.m.! Start today.
  • Award Winning E-learning
  • Lowest price guarantee
  • Personalized service by our expert team
  • Pay safely online or by invoice
  • Order and start within 24 hours

Data Science with Python Masterclass E-Learning Training

This journey will first provide a foundation of data architecture, statistics, and data analysis programming skills using Python and R which will be the first step in acquiring the knowledge to transition away from using disparate and legacy data sources. You will then learn to wrangle the data using Python and R and integrate that data with Spark and Hadoop. Next you will learn how to operationalize and scale data while considering compliance and governance. To complete the journey, you will then learn how take that data and visualize it, to inform smart business decisions.

This learning path, with more than 120 hours of online content, is divided into the following four tracks:

  • Data Science Track 1: Data Analyst
  • Data Science Track 2: Data Wrangler
  • Data Science Track 3: Data Ops
  • Data Science Track 4: Data Scientist

Data Science Track 1: Data Analyst

In this track, the focus is the data analyst role with a focus on: Python, R, architecture, statistics, and Spark.
Content:
E-learning courses

Data Architecture Primer

Course: 1 Hour, 4 Minutes

  • Course Overview
  • Data Defined
  • Data Privacy
  • The Data Lifecycle
  • SQL vs. NoSQL
  • Create an Entity Relationship Diagram
  • Implement a SQL Solution
  • Implement a NoSQL Solution
  • Big Data
  • Data Architecture and Governance
  • IT Data System Architecture Types
  • Data Analytics and Reporting
  • Exercise: Implement Data Architecture Best Practices

Data Engineering Fundamentals

Course: 46 Minutes

  • Course Overview
  • Overview of Distributed Systems
  • Batch vs. In-Memory Processing
  • NoSQL Stores
  • Tools for Data Management
  • What is ETL?
  • ETL with Talend Open Studio
  • Data Modeling
  • AI and Machine Learning
  • Data Partitioning
  • Data Engineering
  • Data Reporting
  • Exercise: Create a Data Model

Python for Data Science: Introduction to NumPy for Multi-dimentional Data

Course: 1 Hour

  • Course Overview
  • Introduction to NumPy and the NumPy Ecosystem
  • Array Creation - Part 1
  • Array Creation - Part 2
  • Printing Arrays
  • Basic Array Operations
  • Universal Functions
  • Indexing and Slicing
  • Iterating Over Arrays
  • Reshaping Arrays
  • Exercise: Python NumPy Array Operations

Python for Data Science: Advanced Operations with NumPy Arrays

Course: 1 Hour, 8 Minutes

  • Course Overview
  • Splitting NumPy Arrays
  • Images as Arrays
  • Image Manipulation Using NumPy
  • Views and NumPy Arrays
  • Deep Copies of Arrays
  • Introduction to Index Masks
  • Applying Index Masks
  • Indexing with Boolean Masks
  • Structured Arrays
  • Understanding Array Broadcasting
  • Applying Broadcasting Rules on Array Operations
  • Exercise: NumPy Multi-dimensional Array Operations

Python for Data Science: Introduction to Pandas

Course: 1 Hour, 6 Minutes

  • Course Overview
  • Features of Pandas and the Pandas Ecosystem
  • Introduction to Pandas
  • Work with Pandas
  • Introduction to DataFrames
  • Work with DataFrames
  • Load Data into a DataFrame
  • Add and Delete DataFrame Contents
  • Select Parts of a DataFrame
  • Access Pandas DataFrames
  • Introduction to Multi-Indexing in a Dataframe
  • Reshape DataFrames
  • Reshape Dataframes Using Stack and Melt Operations
  • Exercise: Pandas for Basic Tabular Data Manipulation

Python for Data Science: Manipulating and Analyzing Data in Pandas DataFrames

Course: 45 Minutes

  • Course Overview
  • Iterating Over the Contents of a DataFrame
  • Exporting a DataFrame
  • Sorting
  • Handling Missing Data
  • Grouping with a Multi-Index
  • Merging DataFrames
  • Applying Join Operations on DataFrames
  • Pandas and Relational Databases
  • Exercise: Pandas for Advanced Data Manipulation

R for Data Science: Data Structures

Course: 52 Minutes

  • Course Overview
  • Creating Vectors
  • Manipulating Vectors
  • Sorting Vectors
  • Using Lists
  • Creating Matrices
  • Matrix Operations
  • Creating Factors
  • Creating Data Frames
  • Data Frame Operations
  • Exercise: Creating and Using a Data Frame

R for Data Science: Importing and Exporting Data

Course: 34 Minutes

  • Course Overview
  • Reading from CSV
  • Reading from Excel
  • Reading from HTML
  • Exporting to CSV
  • Exporting to Excel
  • Exporting to HTML
  • Exercise: Reading and Writing Data

R for Data Science: Data Exploration

Course: 41 Minutes

  • Course Overview
  • Creating dplyr Tables
  • Selecting Subsets
  • Filtering Tabular Data
  • Piping Data
  • Mutating Data
  • Summarizing Data
  • Combining Datasets
  • Grouping Data
  • Exercise: Querying Data

R for Data Science: Regression Methods

Course: 37 Minutes

  • Course Overview
  • Linear Data Preparation
  • Creating Linear Models
  • Interpreting Model Output
  • Using Linear Prediction
  • Logistic Data Preparation
  • Using glm
  • Exercise: Creating a Linear Model

R for Data Science: Classification & Clustering

Course: 39 Minutes

  • Course Overview
  • Preparing Data for Classification
  • Using rpart
  • Using ctree
  • Preparing Data for Clustering
  • Using K-Means Clustering
  • Using Hierarchical Clustering
  • Exercise: Creating a Decision Tree

Data Science Statistics: Simple Descriptive Statistics

Course: 1 Hour, 11 Minutes

  • Course Overview
  • Descriptive and Inferential Statistics
  • Population vs. Sample
  • Probability vs. Non-Probability Sampling
  • Mean
  • Median
  • Mode
  • IQR
  • Variance
  • Exercise: Using Descriptive Statistics

Data Science Statistics: Common Approaches to Sampling Data

Course: 47 Minutes

  • Course Overview
  • Terms in Sampling
  • Sampling Bias
  • Simple Random Sampling
  • Systematic Random Sampling
  • Stratified Sampling
  • Non-Probability Sampling
  • Exercise: Efficient and Correct Sampling

Data Science Statistics: Inferential Statistics

Course: 1 Hour, 2 Minutes

  • Course Overview
  • Gaussian Distribution
  • Inferential Statistics and Hypothesis Testing
  • Simplified Example of Hypothesis Testing
  • T-tests9
  • Skewness and Kurtosis
  • Correlation and Autocorrelation
  • Introducing Linear Regression
  • Overfitting and Goodness-of-Fit
  • Exercise: Basic Inferential Statistics

Accessing Data with Spark: An Introduction to Spark

Course: 1 Hour, 7 Minutes

  • Course Overview
  • Introduction to Spark and Hadoop
  • Resilient Distributed Datasets (RDDs)
  • RDD Operations
  • Spark DataFrames
  • Spark Architecture
  • Spark Installation
  • Working with RDDs
  • Creating DataFrames from RDDs
  • Contents of a DataFrame
  • The SQLContext
  • The map() Function of an RDD
  • Accessing the Contents of a DataFrame
  • DataFrames in Spark and Pandas
  • Exercise: Working with Spark

Getting Started with Hadoop: Fundamentals & MapReduce

Course: 1 Hour, 4 Minutes

  • Course Overview
  • An Introduction to Big Data
  • Building Systems to Scale with Data
  • A Quick Overview of Hadoop
  • MapReduce Overview
  • The Map Phase of a MapReduce
  • The Shuffle and Reduce Phases
  • Exercise: Fundamentals of Hadoop and MapReduce

Getting Started with Hadoop: Developing a Basic MapReduce Application

Course: 1 Hour, 14 Minutes

  • Course Overview
  • Provisioning a Hadoop Cluster on the Cloud
  • Browsing the Hadoop Web Applications
  • Creating a MapReduce project
  • Coding the Map Phase
  • Coding the Reduce Phase
  • Defining the Driver Program
  • Building the Application
  • Executing the MapReduce Application
  • Exercise: Developing a Basic MapReduce Application

Hadoop HDFS: Introduction

Course: 1 Hour, 15 Minutes

  • Course Overview
  • Scaling Datasets
  • Horizontal Scaling for Big Data
  • Distributed Clusters and Horizontal Scaling
  • Overview of HDFS
  • HDFS Architectures
  • MapReduce for HDFS
  • YARN for HDFS
  • The Mechanism of Resource Allocation in Hadoop
  • Apache Zookeeper for HDFS
  • The Hadoop Ecosystem
  • Exercise: An Introduction to HDFS

Hadoop HDFS: Introduction to the Shell

Course: 53 Minutes

  • Course Overview
  • Creating a Hadoop Cluster on the Google Cloud
  • Exploring Hadoop Clusters
  • The YARN Cluster Manager UI
  • The HDFS NameNode UI
  • Browsing the Packaged Hadoop Tools
  • Configuring HDFS
  • The HDFS Shells
  • Exercise: Introduction to the HDFS Shell

Hadoop HDFS: Working with Files

Course: 48 Minutes

  • Course Overview
  • Basic Directory Commands in HDFS
  • Using the copyFromLocal Command in HDFS
  • Using the put Command in HDFS
  • Using the copyToLocal Command in HDFS
  • Retrieving files from HDFS
  • Append and Delete Operations in HDFS
  • Exercise: Working with Files on HDFS

Hadoop HDFS: File Permissions

Course: 49 Minutes

  • Course Overview
  • The HDFS count and du Commands
  • Viewing and Setting File Permissions in HDFS
  • Applying Permissions Recursively in HDFS
  • An Introduction to Bash Scripting
  • Scripting HDFS Operations
  • Exploring the HDFS NameNode UI
  • Cleanup Operations in HDFS
  • Exercise: File Permissions on HDFS

Data Silos, Lakes, & Streams: Introduction

Course: 1 Hour, 20 Minutes

  • Course Overview
  • Data Silos
  • Data Lakes
  • Characteristics of Data Lakes
  • Data Lake Architecture, Features, and Challenges
  • Data Warehouses
  • Data Warehouses vs. Data Lakes
  • Data Streams
  • Migrating Data to AWS
  • Data Lakes on AWS
  • Working with Data Lakes on AWS
  • Exercise: Data Silos, Lakes, and Streams

Data Silos, Lakes, and Streams: Data Lakes on AWS

Course: 1 Hour, 10 Minutes

  • Course Overview
  • Create a Role for the AWS Glue Service
  • Upload Data to S
  • Explore the Glue Web Console
  • Manually Create Glue Tables
  • Query the Data Lake Using Amazon Athena
  • Configure and Run Glue Crawlers
  • Access Data in Crawled Tables
  • Crawl Multiple CSV Files in the Same Folder Path
  • Merge Data in Multiple Files in the Same Folder Path
  • Work with Files Having the Exact Same Schema
  • Exercise: Data Lakes on AWS with S3 and Glue

Data Silos, Lakes, & Streams: Sources, Visualizations, & ETL Operations

Course: 1 Hour, 29 Minutes

  • Course Overview
  • Set Up a Redshift Cluster
  • Create Tables and Load Data From S
  • Establish a JDBC Connection to Redshift
  • Crawl Redshift Using a JDBC Connection
  • Crawl DynamoDB
  • Configure QuickSight to Visualize Data
  • Visualize Data in QuickSight
  • Configure a Job to Perform Extract, Transform, Load
  • Execute an ETL Operation in Glue
  • Perform ETL to Back Up Redshift Data in S3 Buckets
  • Perform ETL to Back Up DynamoDB Data in S3 Buckets
  • Exercise: Multiple Sources, Visualizations, and ETL

Data Analysis Application

Course: 1 Hour, 25 Minutes

  • Course Overview
  • Install and Configure Anaconda Python
  • Install R Using Anaconda
  • Use Jupyter Notebook
  • Import and Export Data in Python
  • Import and Export Data in R
  • Deal with Missing Data in R
  • Transform Data in R
  • Work with Numpy
  • Work with Pandas
  • Mean, Median, and Mode in R
  • Analyze Data with Pandas
  • Plot Data in R
  • Visualize Data in Python
  • Exercise: Perform Data Analysis

Online Mentor

You can reach your Mentor by entering chats or submitting an email.

Final Exam assessment

Estimated duration: 65 minutes

Practice Labs: Analyzing Data with Python (estimated duration: 8 hours)

Practice performing data analysis tasks using Python by configuring VSCode, loading data from SQLite into Pandas, grouping data and using box plots. Then, test your skills by answering assessment questions after using Python to calculate frequency distribution, measures of center, and coefficient of dispersion. This lab provides access to several tools commonly used in data science, including:
o VS Code, Anaconda, Jupyter Notebook + Hub, Pandas, NumPy, SiPy, Seaborn Library, Spyder IDE

Data Science Track 2: Data Wrangler

In this track, the focus will be on the data wrangler role. We will explore areas such as: wrangling with Python, Mongo, and Hadoop.
Content:
E-learning courses

Data Wrangling with Pandas: Working with Series & DataFrames

Course: 1 Hour, 11 Minutes

  • Course Overview
  • Installing Pandas
  • Pandas Series Objects
  • Operations on Series
  • Appending and Sorting Series Values
  • Pandas DataFrames
  • Indexing Operations with DataFrames
  • Missing Data
  • Column Aggregations
  • Statistical Operations

Data Wrangling with Pandas: Visualizations and Time-Series Data

Course: 1 Hour, 29 Minutes

  • Course Overview
  • Pandas and Matplotlib for Visualizations
  • Pie Charts, Box Plots, and Scatter Plots
  • Time-Series Data
  • Deltas and Percentage Change Calculations
  • Time Deltas and Date Ranges
  • Mismatched DataFrames and Missing Data
  • Working with String Data
  • Advanced Operations on Strings
  • Applying Functions on Series
  • Transforming Data With User-Defined Functions
  • Applying Functions on DataFrames
  • Exercise: Plot Charts and Transform Column Values

Data Wrangling with Pandas: Advanced Features

Course: 1 Hour, 12 Minutes

  • Course Overview
  • Grouping and Aggregations
  • MultiIndex DataFrames
  • Grouping and Aggregations with MultiIndex DataFrames
  • General Aggregation Functions
  • Filtering
  • Masking Column Values
  • Working with Duplicates
  • Working with Categorical Data
  • Filtering, Adding, and Removing Categories
  • Reindexing
  • Exercise: Filtering, Duplicates and Categorical Data

Data Wrangler 4: Cleaning Data in R

Course: 1 Hour, 3 Minutes

  • Course Overview
  • Types of Unclean Data
  • Data Quality
  • Downloading JSON Data
  • Excel Sheets
  • Reading Dirty CSVs
  • Querying Relational Databases
  • Joining Tabular Data
  • Spreading Data
  • Summarizing Data
  • Imputing Data
  • Extracting Matches
  • Exercise: Wrangling Data

Data Tools: Technology Landscape & Tools for Data Management

Course: 27 Minutes

  • Course Overview
  • Technology Landscape and Tools
  • Tool Comparison
  • Machine Learning in Data Analytics
  • Machine Learning Tools
  • Machine Learning Implementation
  • Python and R for Data Management
  • Cloud and Machine Learning
  • Exercise: Implement Machine Learning on Scikit-learn

Data Tools: Machine Learning & Deep Learning in the Cloud

Course: 23 Minutes

  • Course Overview
  • Microsoft Machine Learning Toolkit
  • AWS and Machine Learning
  • Spark Machine Learning Capabilities
  • Deep Learning Frameworks
  • Deep Learning Implementation
  • Data Mining and Analytical Tools
  • KNIME Capabilities
  • Exercise: Implement Deep Learning

Trifacta for Data Wrangling: Wrangling Data

Course: 50 Minutes

  • Course Overview
  • Standardizing Data
  • Formatting Dates
  • Filtering Rows
  • Replacing Values
  • Counting Matches
  • Splitting Columns
  • Merging Columns
  • Extracting Data
  • Conditional Aggregation
  • Reshaping Data
  • Joining Data
  • Exercise: Wrangling Data

MongoDB for Data Wrangling: Querying

Course: 1 Hour, 8 Minutes

  • Course Overview
  • Introduction to PyMongo
  • Document Structure
  • CRUD Operations
  • ObjectID and Timestamp
  • Query Operations
  • Projection Queries
  • Comparison Operators
  • Element Query Operators
  • The Regex Operator
  • Using the Size and All Operators
  • Text Search
  • Using mongoimport
  • Using mongoexport
  • Exercise: Performing a Query

MongoDB for Data Wrangling: Aggregation

Course: 51 Minutes

  • Course Overview
  • Aggregation Framework
  • Using Group
  • Using Match
  • Using Project
  • Using Limit and Sort
  • Using Unwind
  • Using Lookup
  • Using Indexes
  • Using Geospatial Indexes
  • Exercise: Performing an Aggregate Query

Getting Started with Hive: Introduction

Course: 56 Minutes

  • Course Overview
  • Hive as a Data Warehouse
  • Overview of Relational Databases
  • OLTP and OLAP
  • Hive and the Hadoop Ecosystem
  • HiveServer and The Metastore
  • Hive on Cloud Computing Platforms
  • Data Types in Hive
  • Data and Tables in Hive
  • Exercise: Introduction to Hive

Getting Started with Hive: Loading and Querying Data

Course: 1 Hour, 20 Minutes

  • Course Overview
  • Setting up a Hadoop Cluster on the Google Cloud
  • Creating a Hive Table
  • Running Simple Queries in Hive
  • Executing Hive Queries from the Shell
  • Joining Tables in Hive
  • Exploring the Hive Warehouse
  • External Tables in Hive
  • Modifying Tables in Hive
  • Temporary Tables in Hive
  • Loading Data into Tables in Hive
  • Populating Multiple Tables in Hive
  • Exercise: Loading and Querying Data in Hive

Getting Started with Hive: Viewing and Querying Complex Data

Course: 1 Hour, 14 Minutes

  • Course Overview
  • The Array Data Type in Hive
  • The Map Data Type in Hive
  • The Struct Type in Hive
  • The explode and posexplode Functions in Hive
  • Lateral Views in Hive
  • Multiple Lateral Views in Hive
  • Set Operations in Hive
  • The IN and EXISTS clauses in Hive
  • Creating and Populating Tables in Hive
  • Views in Hive
  • Exercise: Viewing and Querying Complex Data

Getting Started with Hive: Optimizing Query Executions

Course: 43 Minutes

  • Course Overview
  • Hive Queries as MapReduce Jobs
  • Techniques to Improve Query Performance in Hive
  • Partitioning Tables in Hive
  • Bucketing Tables in Hive
  • Structuring Join Queries in Hive
  • Exercise: Optimizing Query Execution in Hive

Getting Started with Hive: Optimizing Query Executions with Partitioning

Course: 1 Hour, 1 Minute

  • Course Overview
  • Setting up a Hadoop Cluster on the Google Cloud
  • Creating a Partitioned Table in Hive
  • Working with Partitions in Hive
  • Populating Partitions in Hive
  • Partitioning External Tables in Hive
  • Modifying Partitions in Hive
  • Dynamic Partitions in Hive
  • Using Multiple Columns for Partitioning in Hive
  • Exercise: Optimize Executions with Partitioning

Getting Started with Hive: Bucketing & Window Functions

Course: 1 Hour, 4 Minutes

  • Course Overview
  • Apply Bucketing for a Table in Hive
  • Using Bucketing and Partitioning Together in Hive
  • Sorting a Bucket's Contents in Hive
  • Sampling a Table in Hive
  • Joining Multiple Tables in Hive
  • Introducing Window Functions in Hive
  • Windows Functions with Partitions in Hive
  • Exercise: Bucketing and Window Functions in Hive

Getting Started with Hadoop: Filtering Data Using MapReduce

Course: 59 Minutes

  • Course Overview
  • Counting the Data Points in Each Category
  • The Reducer and Driver Programs
  • Building and Executing the Application
  • A Simple Filter Using MapReduce
  • Executing and Examining the Output
  • Extracting the Unique Values in a Column
  • Viewing the Distinct Values Extracted
  • Exercise: Filtering Data Using MapReduce

Getting Started with Hadoop: MapReduce Applications With Combiners

Course: 1 Hour, 24 Minutes

  • Course Overview
  • Combiners in MapReduce
  • Revisiting MapReduce
  • Working with Combiners
  • Using Combiners for Calculating Averages
  • Creating a Project to Calculate Averages
  • Coding the Map and Reduce Phases8
  • Configure the Application in the Driver
  • Executing the Application and Examining the Output
  • Adding a Combiner to a MapReduce Application
  • Conveying a Pair of Numbers from the Mapper
  • Running the Fixed Application
  • Exercise: Optimizing MapReduce With Combiners

Getting Started with Hadoop: Advanced Operations Using MapReduce

Course: 49 Minutes

  • Course Overview
  • Defining a User-Defined Type for a PriorityQueue
  • Implementing a PriorityQueue in a Mapper
  • Using a PriorityQueue in a Reducer
  • Running and Verifying the Results
  • Building an Inverted Index - Map Phase
  • Building an Inverted Index - Reduce Phase
  • Executing the Application and Viewing the Index
  • Exercise: Advanced Operations Using MapReduce

Accessing Data with Spark: Data Analysis Using the Spark DataFrame API

Course: 1 Hour, 12 Minutes

  • Course Overview
  • Performance Improvements in Spark
  • Broadcast Variables and Accumulators
  • Loading Data into a DataFrame
  • Sampling the Contents of a DataFrame
  • Grouping and Aggregations
  • Visualizing Data in a DataFrame
  • Trimming and Cleaning Data
  • User-Defined Functions and DataFrames
  • Combining Filters, Aggregations, and Sorting
  • Using Broadcast Variables
  • Using Accumulators
  • Exporting DataFrame Contents
  • Custom Accumulators
  • Join Operations
  • Exercise: Data Analysis Using the DataFrame API

Accessing Data with Spark: Data Analysis using Spark SQL

Course: 55 Minutes

  • Course Overview
  • The Spark Catalyst Optimizer
  • Introduction to Spark SQL
  • Preparing Data for Analysis
  • Running SQL Queries
  • Inferred and Explicit Schemas
  • Windowing in Spark
  • Applying Window Functions
  • Exercise: Data Analysis Using Spark SQL

Data Lake: Framework & Design Implementation

Course: 34 Minutes

  • Course Overview
  • Data Lakes and Data Warehouses
  • Data Lake Selection Criteria
  • Data Lake and Data Democratization
  • Data Lake Design Principles
  • AWS Data Lake Architecture
  • Implement AWS Data Store
  • Data Lake For On-Premise and Multi-Cloud
  • Data Processing Frameworks for Data Lake
  • Exercise: Implement AWS Data Store

Data Lake: Architectures & Data Management Principles

Course: 35 Minutes

  • Course Overview
  • Real-Time Big Data Architectures
  • Data Lake Reference Architecture
  • Data Ingestion and File Formats
  • Ingestion Using Sqoop
  • Data Processing Strategies
  • Deriving Value from Data Lakes
  • Data Life Cycle
  • S3 and Glacier
  • Exercise: Ingest Data and Implement Archival Policy

Data Architecture - Deep Dive: Design & Implementation

Course: 36 Minutes

  • Course Overview
  • Data Complexity Management Strategies
  • Data Modeling Process
  • Distributed Data Management
  • Partitioning Methods and Criteria
  • MongoDB Partitioning
  • Hybrid Data Architectures
  • Implement Directed Acyclic Graph
  • CAP Theorem
  • Batch vs. Streaming
  • Read and Write Concerns
  • Exercise: Implement Serverless Architecture

Data Architecture - Deep Dive: Microservices & Serverless Computing

Course: 26 Minutes

  • Course Overview
  • Microservices and Data
  • Serverless and Lambda Architecture
  • Lambda Implementation
  • Cluster Benefits
  • Data Architecture Types
  • Data Discovery Process
  • Data Risk Types
  • Data POC
  • Exercise: Implement Lambda Architecture

Online Mentor

You can reach your Mentor by entering chats or submitting an email.

Final Exam assessment

Estimated duration: 90 minutes
Nova Learning, januar 2021

Practice Labs: Data Wrangling with Python (estimated duration: 8 hours)

Perform data wrangling tasks including using a Pandas DataFrame to convert multiple Excel sheets to separate JSON documents, extract a table from an HTML file, use mean substitution and convert dates within a DataFrame. Then, test your skills by answering assessment questions after using a Pandas DataFrame to convert a CSV document to a JSON document, replace missing values with a default value, split a column with a delimiter and combine two columns by concatenating text.

Data Science Track 3: Data Ops

The tracks objective is to help prepare the learner for a Data Ops role with a focus on governance, security, and harnessing volume and velocity.
Content:
E-learning courses

Deploying Data Tools: Data Science Tools

Course: 48 Minutes

  • Course Overview
  • Data Science Platform
  • Challenges of Deploying Data Science Tools
  • Considerations for Data Science Tools
  • Data Science Workflow
  • Data Science Analytic Tools
  • Data Science Visualization Tools
  • Data Science Database Tools
  • Benefits of Deploying Cloud-Based Tools
  • Challenges of Deploying Cloud-Based Tools
  • What is DevOps
  • DevOps for Data Science
  • Exercise: Identifying Uses of Data Science Tools

Delivering Dashboards: Management Patterns

Course: 34 Minutes

  • Course Overview
  • Analytical Visualization
  • Dashboard Types
  • Data Management
  • Dashboard Components
  • Dashboard Best Practices
  • Dashboard Using ELK
  • Dashboard Using Power BI
  • Chart Selection Criteria
  • Leaderboards and Scorecards
  • Scorecard Types
  • Exercise: Create Dashboards with PowerBI and ELK

Delivering Dashboards: Exploration & Analytics

Course: 31 Minutes

  • Course Overview
  • Data Exploration Using Charts
  • Analytical Visualization Tools
  • Bar and Line Charts
  • Dashboarding with Kibana
  • Dashboard Sharing with Kibana
  • Dashboarding with Tableau
  • Dashboarding with Qlikview
  • Data Ingest and Dashboards
  • Dashboard Patterns
  • Monitoring Dashboards
  • Exercise: Create Dashboards Using Kibana and Tableau

Cloud Data Architecture: DevOps & Containerization

Course: 45 Minutes

  • Course Overview
  • Containerization on the Cloud
  • Benefits of Containers
  • Serverless Computing
  • DevOps in the Cloud
  • AWS OpsWorks
  • Storage Classification
  • Cloud and Machine Learning
  • Cloud and BI Analytics
  • Exercise: Containerization and Serverless Computing

Compliance Issues and Strategies: Data Compliance

Course: 44 Minutes

  • Course Overview
  • Data Compliance Issues
  • Data Regulations
  • The Importance of Global Standards
  • Risk and Company Standards
  • Myths and Facts of Data Compliance
  • Compliance Training for Users
  • Compliance Training for Management
  • The Benefits of a Data Compliance Program
  • Elements of a Good Compliance Strategy
  • Building a Compliance Strategy
  • Reporting and Response Procedures
  • Exercise: Explain the Importance of Data Compliance

Implementing Governance Strategies

Course: 46 Minutes

  • Course Overview
  • Governance and its Relationship with Big Data
  • Why Big Data Requires Governance
  • Requirements for Big Data Governance
  • Why is Big Data Different?
  • Identifying Data
  • Identifying Stakeholders
  • Cloud Technologies and Data Governance
  • Designing a Data Governance Process
  • Managing a Data Governance Strategy
  • Monitoring a Data Governance Strategy
  • Maintaining a Data Governance Strategy
  • Exercise: Defining Data Governance Strategies

Data Access & Governance Policies: Data Access Oversight and IAM

Course: 59 Minutes

  • Course Overview
  • Data Access Governance
  • Risk and Data Safety Compliance
  • Data Access Patterns
  • Data Breach Prevention
  • Least Privilege
  • Assign and View Effective File System Permissions
  • Identity and Access Management
  • Create an AWS IAM User and Group
  • Assign AWS IAM Group Permissions
  • Vulnerability Assessments
  • Implement Effective Security Controls
  • Exercise: Implement Data Access Governance Solutions

Data Access & Governance Policies: Data Classification, Encryption, and Monitoring

Course: 1 Hour, 19 Minutes

  • Course Overview
  • Data Classification
  • Classify Data Using Microsoft FSRM
  • Data Encryption
  • Encrypt Data at Rest
  • Encrypt Data in Motion
  • Implement Security Compliance Checking
  • Examine Data Access Trends
  • Data Access Monitoring Solutions
  • Logging, Auditing, and Data Analytics
  • Configure a Custom Filtered Log View
  • Enable Windows Data Access Auditing
  • Exercise: Implement Data Confidentiality

Streaming Data Architectures: An Introduction to Streaming Data

Course: 51 Minutes

  • Course Overview
  • Introduction to Streaming data
  • The Stream Processing Model
  • The Message Transport
  • Stream Processing with RDDs
  • Structured Streaming for Continuous Applications
  • Streaming vs Structured Streaming
  • Triggers and Output Modes
  • Exercise: Working with Streaming Data

Streaming Data Architectures: Processing Streaming Data

Course: 53 Minutes

  • Course Overview
  • PySpark Setup
  • Setting Up a Socket Stream with Netcat
  • The Update Output Mode
  • Using a File Input Stream
  • The Append Output Mode
  • The Complete Output Mode
  • Aggregations on Streaming Data
  • SQL Operations on Streaming Data
  • User-Defined Functions (UDFs)
  • Exercise: Processing Streaming Data

Scalable Data Architectures: Introduction

Course: 53 Minutes

  • Course Overview
  • Scalable Architectures with Distributed Computing
  • Introducing Data Warehouses
  • Contrasting Warehouses with Relational Databases
  • Data Warehouses for Analytical Processing
  • Data Warehouse Architectural Components
  • Amazon Redshift - A Data Warehouse on the Cloud
  • Exercise: Scalable Data Architectures

Scalable Data Architectures: Introduction to Amazon Redshift

Course: 55 Minutes

  • Course Overview
  • Provisioning a Redshift Cluster Using Quick Launch
  • Creating a Redshift Cluster With Additional Detail
  • Exploring the Redshift Configs and Metrics
  • Attaching an IAM Role to a Redshift Cluster
  • Creating an AWS User to Work With Redshift
  • Installing and Configuring the AWS CLI
  • Running Queries from the Redshift Query Editor
  • Exercise: An Introduction to Amazon Redshift

Scalable Data Architectures: Working with Amazon Redshift & QuickSight

Course: 1 Hour, 18 Minutes

  • Course Overview
  • Loading Data from Amazon S3 to a Redshift Cluster
  • Running Queries and Evaluating Their Execution
  • Querying a Redshift Cluster Using a SQL client
  • Working with Automated Snapshots
  • Restoring Tables from a Snapshot
  • Horizontal Scaling of a Redshift Cluster
  • Vertical and Horizontal Scaling of a Cluster
  • Configuring Access from QuickSight to Redshift
  • Loading a Dataset to QuickSight
  • Creating Visualizations with QuickSight
  • Exercise: Working with Redshift and QuickSight

Building Data Pipelines

Course: 1 Hour, 10 Minutes

  • Course Overview
  • Data Pipelines Overview
  • Traditional ETL Pipeline with Batch Processing
  • Data Pipeline Tools
  • Setup and Install Airflow
  • Apache Airflow
  • Airflow Workflows
  • Airflow Tasks
  • Airflow Dependencies
  • ETL Pipeline with Airflow
  • Automated Pipeline without ETL
  • Airflow Command Line Testing
  • Exercise: Using Apache Airflow

Data Pipeline: Process Implementation Using Tableau & AWS

Course: 39 Minutes

  • Course Overview
  • Data Pipeline
  • Data Pipeline Processes
  • Data Pipeline Stages
  • Data Pipeline Technologies
  • Data Source Types
  • Scheduled Data Pipeline
  • Tableau Server and Utilities
  • Data Pipeline Using Tableau
  • Data Pipeline on AWS
  • Exercise: Build Data Pipelines with Tableau

Data Pipeline: Using Frameworks for Advanced Data Management

Course: 33 Minutes

  • Course Overview
  • Celery and Luigi
  • Data Pipeline with Python Luigi
  • Working with Dask Library
  • Dask Arrays
  • Data Exploration and Visualization Frameworks
  • Spark and Tableau
  • Streaming Data Visualization with Python
  • Data Pipeline Open Source Tools
  • Exercise: Implement Data Pipelines with Luigi

Data Sources: Integration

Course: 40 Minutes

  • Course Overview
  • Elements of IoT Solutions
  • Service Categories in IoT
  • IoT Capabilities and Maturity Model
  • IoT Design Principles
  • IoT Cloud Architectures
  • MQTT and XXMP
  • IoT Controllers
  • IoT Data Management
  • Securing IoT
  • Exercise: Generating Data Streams

Data Sources: Implementing Edge on the Cloud

Course: 31 Minutes

  • Course Overview
  • AWS IoT Greengrass
  • GCP IoT Edge
  • AWS IoT over WebSockets
  • IoT Device Simulator
  • Generating Streams of Data Using MQTT
  • Exercise: Working with IoT Device Simulators

Securing Big Data Streams

Course: 1 Hour, 3 Minutes

  • Course Overview
  • Big Data Security Concerns
  • Streaming Data Security Concerns
  • NoSQL Database Security Concerns
  • Distributed Processing Security Risks
  • Data Mining and Analytics Privacy Flaws
  • End-Point Device Tampering Risks
  • Secure Big Data
  • Secure Data Streams
  • Secure Data In Motion
  • End-Point Input Validation and Filtering
  • Secure Data at Rest with Symmetric Ciphers
  • Exercise: Securing Big Data Streams

Harnessing Data Volume & Velocity: Big Data to Smart Data

Course: 39 Minutes

  • Course Overview
  • Comparing Big Data and Smart Data
  • Smart Data and Edge Technologies
  • Big Data to Smart Data Formation
  • Smart Data and Smart Processes
  • Smart Data Use Cases
  • Smart Data Life Cycle
  • Big Data to Smart Data Using k-NN
  • Smart Data Frameworks
  • Smart Data to Business
  • Clustering Smart Data
  • Smart Data Integration
  • Exercise: Transform Big Data to Smart Data

Data Rollbacks: Transaction Rollbacks & Their Impact

Course: 36 Minutes

  • Course Overview
  • Rollback Process
  • State of Transactions
  • Transaction Types
  • SQL Transaction Management
  • Transaction Log Operations
  • Deadlock Management
  • SQL Server Rollback Mechanism
  • SQL Server Rollback Mechanism Implementation
  • Exercise: Implement Transactions with SQL Server

Data Rollbacks: Transaction Management & Rollbacks in NoSQL

Course: 29 Minutes

  • Course Overview
  • NoSQL and SQL Transaction Management
  • MongoDB Transactions
  • Manage Multi-Document Transactions in MongoDB
  • Change Data Capture
  • Change Stream in MongoDB
  • MongoDB Change Stream Implementation
  • Exercise: MongoDB Transactions and Change Streams

Online Mentor

You can reach your Mentor by entering chats or submitting an email.

Final Exam assessment

Estimated duration: 90 minutes

Practice Labs: Implementing Data Ops with Python (estimated duration: 8 hours)

Perform data ops tasks with Python including working with row subsets, creating new columns with Regex, performing joins and spreading rows. Then, test your skills by answering assessment questions after working with field subsets and computed columns, and performing set operations and binding rows.

Data Science Track 4: Data Scientist

For this track, the focus will be on the Data Scientist role. Here we will explore areas such as: visualization, APIs, and ML and DL algorithms.
Content:
E-learning courses

Balancing the Four Vs of Data: The Four Vs of Data

Course: 40 Minutes

  • Course Overview
  • Overview of the Four Vs
  • The Importance of Volume
  • The Importance of Variety
  • The Importance of Velocity
  • The Importance of Veracity
  • The Relationship Between the Four Vs
  • Variety and Data Structure
  • Validity and Volatility
  • Finding Balance in the Four Vs
  • Use Cases
  • Extracting Value from the Four Vs
  • Exercise: Describe the Four Vs of Big Data

Data Driven Organizations

Course: 1 Hour, 15 Minutes

  • Course Overview
  • Data Driven Organizations
  • Decision Making
  • Analytic Maturity
  • Analytic Roles
  • Data Source Priority
  • Facets of Data Quality
  • Power BI Data Visualization
  • Missing Data
  • Duplicate Data
  • Truncated Data
  • Data Provenance

Raw Data to Insights: Data Ingestion & Statistical Analysis

Course: 54 Minutes

  • Course Overview
  • Statistical Analysis
  • Data Correction
  • Outlier Detection
  • Data Architecture Pattern
  • Data Ingestion Tools
  • Kafka and Apache NiFi
  • Apache Sqoop Ingest
  • Ingest Using WaveFront

Raw Data to Insights: Data Management & Decision Making

Course: 57 Minutes

  • Course Overview
  • Data-driven Decision Making Framework
  • Loading Data into R
  • Preparing Data
  • Data Correction Approach
  • Data Correction Using Simple Transformation
  • Data Correction Using Deductive Correction
  • Distributed Data Management
  • Data Analytics
  • Data Analytics Using R
  • Predictive Modeling

Tableau Desktop: Real Time Dashboards

Course: 1 Hour, 8 Minutes

  • Course Overview
  • Introducing Real Time Dashboards
  • Creating Real Time Dashboards with Tableau
  • Build a Tableau Dashboard
  • Real Time Dashboard Updates in Tableau
  • Organizing Your Tableau Dashboard
  • Formatting Your Tableau Dashboard
  • Interactive Tableau Dashboard
  • Tableau Dashboard Starters
  • Tableau Dashboard Extensions
  • Tableau Dashboards and Story Points
  • Sharing your Tableau Dashboard

Storytelling with Data: Introduction

Course: 47 Minutes

  • Course Overview
  • Storytelling Process
  • Interpreting Context
  • Analysis Types
  • Who, What, and How of Storytelling
  • Visualization for Storytelling
  • Graphical Tools for Data Elaboration
  • Storytelling Scenarios
  • Storyboarding

Storytelling with Data: Tableau & PowerBI

Course: 57 Minutes

  • Course Overview
  • Visual Selection
  • Slopegraphs
  • Bar Charts and Types of Bar Charts
  • Clutter and Clutter Elimination
  • Gestalt Principle
  • Story Design Best Practices
  • Tools for Storytelling
  • Decluttering
  • Crafting Visual Data
  • Visual Design Concerns
  • Storytelling with Power BI
  • Model Visual and Tableau

Python for Data Science: Basic Data Visualization Using Seaborn

Course: 1 Hour, 7 Minutes

  • Course Overview
  • Introduction to Seaborn
  • Install Seaborn
  • Simple Univariate Distributions
  • Configure Univariate Distribution Plots
  • Simple Bivariate Distributions
  • Explore Different Types of Bivariate Distributions
  • Analyze Multiple Variable Pairs
  • Regression Plots
  • Themes and Styles in Seaborn

Python for Data Science: Advanced Data Visualization Using Seaborn

Course: 1 Hour, 4 Minutes

  • Course Overview
  • Searching for Patterns in a Dataset
  • Configuring Plot Aesthetics
  • Normal Distribution and Outliers
  • Distributions Within Categories - Part
  • Distributions Within Categories - Part
  • Analyzing Categories with Facet Grids - Part
  • Analyzing Categories with Facet Grids - Part
  • Introducing Color Palettes
  • Using Color Palettes

Data Science Statistics: Using Python to Compute & Visualize Statistics

Course: 1 Hour, 16 Minutes

  • Course Overview
  • An Introduction to Matplotlib
  • Analyzing Data Using NumPy and Pandas
  • Visualizing Univariate and Bivariate Distributions
  • Summary Statistics Using Native Python Functions
  • Summary Statistics Using NumPy
  • Summary Statistics Using the SciPy Library
  • Correlation and Covariance
  • Z-score

R for Data Science: Data Visualization

Course: 33 Minutes

  • Course Overview
  • An Introduction to Matplotlib
  • Analyzing Data Using NumPy and Pandas
  • Visualizing Univariate and Bivariate Distributions
  • Summary Statistics Using Native Python Functions
  • Summary Statistics Using NumPy
  • Summary Statistics Using the SciPy Library
  • Correlation and Covariance
  • Z-score

Advanced Visualizations & Dashboards: Visualization Using Python

Course: 38 Minutes

  • Course Overview
  • Relevance of Data Visualization for Business
  • Libraries for Data Visualization in Python
  • Python Data Visualization Environment Configuration
  • Matplotlib Libraries for Visualization
  • Bar Chart Using ggplot
  • Bokeh and Pygal
  • Select Visualization Libraries
  • Interactive Graphs and Image Files
  • Plot Graphs
  • Multiple Lines in Graphs

Advanced Visualizations & Dashboards: Visualization Using R

Course: 35 Minutes

  • Course Overview
  • Chart Types
  • Stacked Bar Plot
  • Animate Plots with Matplotlib
  • Plotting in Jupyter Notebook
  • Graphics in R
  • Heat Map and Scatter Plot in R
  • Correlogram and Area Chart in R
  • ggplot2 Capabilities
  • Customize ggplot2 Graphs

Powering Recommendation Engines: Recommendation Engines

Course: 1 Hour, 5 Minutes

  • Course Overview
  • Describing Recommendation Engines
  • Comparing the Types of Recommendation Engines
  • Collecting and Manipulating Data
  • Manipulating Data in R
  • Describing Similarity and Neighborhoods
  • Creating a Recommendation Engine
  • Recommending Another Item
  • Finding Items to Recommend
  • Recommending Items Based on Other Items
  • Evaluating a Recommendation System
  • Validating a Recommendation System

Data Insights, Anomalies, & Verification: Handling Anomalies

Course: 46 Minutes

  • Course Overview
  • Data and Anomaly Sources
  • Decomposition and Forecasting
  • Examine Data Using Randomization Tests
  • Anomaly Detection
  • Anomaly Detection Techniques
  • Anomaly Detection with scikit-learn
  • Anomaly Detection Tools
  • Anomaly Detection Rules

Data Insights, Anomalies, & Verification: Machine Learning & Visualization Tools

Course: 51 Minutes

  • Course Overview
  • Machine Learning Anomaly Detection Techniques
  • Comparing Anomaly Detection Algorithms
  • Anomaly Detection Using R
  • Online Anomaly Detection Components
  • Online Anomaly Detection Approaches
  • Anomaly Detection Use Cases
  • Anomaly Detection with Visualization Tools
  • Anomaly Detection with Mathematical Approaches
  • Cluster-Based Anomaly Detection

Data Science Statistics: Applied Inferential Statistics

Course: 1 Hour, 19 Minutes

  • Course Overview
  • The One-Sample T-test
  • Independent and Paired T-tests
  • Testing Hypotheses with T-tests
  • Loading and Analyzing a Skewed Dataset
  • Measuring Skewness and Kurtosis
  • Preparing a Dataset for Regression
  • Simple Linear Regression
  • Multiple Linear Regression

Data Research Techniques

Course: 33 Minutes

  • Course Overview
  • Data Research Fundamentals
  • Data Research Steps
  • Values, Variables, and Observations
  • JMP Scale of Measurement
  • Non-experimental and Experimental Research
  • Descriptive and Inferential Statistical Analysis
  • Inferential Tests
  • Case Study of Clinical Data Research
  • Data Research in Sales Management

Data Research Exploration Techniques

Course: 50 Minutes

  • Course Overview
  • Fundamentals of Exploratory Data Analysis
  • Data Exploration Types
  • Working with R
  • Data Exploration in R
  • Data Exploration Using Plots
  • Python Packages for Data Exploration
  • Data Exploration Using Python
  • Data Research Using Linear Algebra
  • Linear Algebra for Data Research

Data Research Statistical Approaches

Course: 43 Minutes

  • Course Overview
  • Role of Statistics in Data Research
  • Discrete vs. Continuous Distribution
  • PDF and CDF
  • Binomial Distribution
  • Interval Estimation
  • Point and Interval Estimation
  • Data Visualization Techniques
  • Data Visualization Using R
  • Data Integration Techniques
  • Creating Plots
  • Missing Values and Outliers

Machine & Deep Learning Algorithms: Introduction

Course: 46 Minutes

  • Course Overview
  • Machine Learning Algorithms
  • How Machine Learning Works
  • Introduction to Pandas ML
  • Support Vector Machines
  • Overfitting

Machine & Deep Learning Algorithms: Regression & Clustering

Course: 49 Minutes

  • Course Overview
  • The Confusion Matrix
  • An Introduction to Regression
  • Applications of Regression
  • Supervised and Unsupervised Learning
  • Clustering
  • Principal Component Analysis

Machine & Deep Learning Algorithms: Data Preperation in Pandas ML

Course: 1 Hour, 4 Minutes

  • Course Overview
  • Data Preparation in scikit-learn
  • Training and Evaluating Models in scikit-learn
  • Introducing the Pandas ML ModelFrame
  • Training and Evaluating Models in Pandas ML
  • Preparing Data for Regression
  • Evaluating Regression Models
  • Preparing Data for Clustering
  • The K-Means Clustering Algorithm

Machine & Deep Learning Algorithms: Imbalanced Datasets Using Pandas ML

Course: 1 Hour, 24 Minutes

  • Course Overview
  • Analyzing an Imbalanced Dataset
  • The RandomOverSampler
  • The SMOTE Oversampler
  • Undersampling Using imbalanced-learn
  • Ensemble Classifiers for Imbalanced Data
  • Combination Samplers
  • Finding Correlations in a Dataset
  • Building a Multi-Label Classification Model
  • Dimensionality Reduction with PCA
  • Imbalanced Learn and PCA

Creating Data APIs Using Node.js

Course: 1 Hour, 31 Minutes

  • Course Overview
  • API Prerequisites
  • Building a RESTful API Using Node.js and Express.js
  • RESTful API with OAuth
  • HTTP Server with Hapi.js
  • API Modules
  • Returning Data with JSON
  • Nodemon for Development Workflow
  • API Requests
  • POSTman for API
  • Deploying APIs
  • Social Media APIs
  • Exercise: Building RESTful APIs

Online Mentor

You can reach your Mentor by entering chats or submitting an email.

Final Exam assessment

Estimated duration: 90 minutes

Practice Labs: Data Visualization with Python (estimated duration: 8 hours)

Perform data visualization tasks with Python such as creating scatter plots, plotting linear regression, using logistic regression and creating decision tree. Then, test your skills by answering assessment questions after creating time-series graphs, resampling observations, creating histograms and using a grid pair.

Language English
Qualifications of the Instructor Certified
Course Format and Length Teaching videos with subtitles, interactive elements and assignments and tests
Lesson duration 120 Hours
Assesments The assessment tests your knowledge and application skills of the topics in the learning pathway. It is available 365 days after activation.
Online mentor You will have 24/7 access to an online mentor for all your specific technical questions on the study topic. The online mentor is available 365 days after activation, depending on the chosen Learning Kit.
Online Virtuele labs Receive 12 months of access to virtual labs corresponding to traditional course configuration. Active for 365 days after activation, availability varies by Training
Progress monitoring Yes
Access to Material 365 days
Technical Requirements Computer or mobile device, Stable internet connections Web browsersuch as Chrome, Firefox, Safari or Edge.
Support or Assistance Helpdesk and online knowledge base 24/7
Certification Certificate of participation in PDF format
Price and costs Course price at no extra cost
Cancellation policy and money-back guarantee We assess this on a case-by-case basis
Award Winning E-learning Yes
Tip! Provide a quiet learning environment, time and motivation, audio equipment such as headphones or speakers for audio, account information such as login details to access the e-learning platform.

There are no reviews written yet about this product.

Loading...

OEM Office Elearning Menu Genomineerd voor 'Beste Opleider van Nederland'

OEM Office Elearning Menu is trots genomineerd te zijn voor de titel 'Beste Opleider van Nederland' door Springest, een onderdeel van Archipel. Deze erkenning bevestigt onze kwaliteit en toewijding. Hartelijk dank aan al onze cursisten.

Reviews

There are no reviews written yet about this product.

25.000+

Deelnemers getrained

Springest: 9.1 - Edubookers 8.9

Gemiddeld cijfer

3500+

Aantal getrainde bedrijven

20+

Jaren ervaring

Even more knowledge

Read our most recent articles

View blog