Data Infrastructure with Apache Kafka E-Learning
In the Data Infrastructure with Apache Kafka LearningKIT, you will explore Apache Kafka, Integrate Kafka with Python and using consumer groups, Integrate Apache Kafka with Spark, and use Kafka with Cassandra and Confluent.
You will explore the Kafka architecture for event streaming, setting up topics, creating brokers, and handling messages. You will also learn how to produce and consume messages using Kafka, tweaking Kafka broker configurations. This learning also focuses on Kafka performance optimization, and structured streaming with Apache Spark, which includes building Spark applications to process data streamed to Kafka topics using Data Frames and integrating Kafka with Spark and Cassandra for NoSQL data.
This LearningKit with more than 7 hours of learning is divided into three tracks:
Course content
Track 1: Intro to Data Infrastructure
In this track, the focus will be on data infrastructure in an organization, data mesh architecture, data tools, messaging platforms, and data stores.
Courses (½ hour +):
Setting up the Data Infrastructure in an Organization
Course: 46 Minutes
- Course Overview
- Data Infrastructure in an Organization
- Data Mesh Architecture
- Tools for Data Management
- Messaging Platforms
- Data Stores
- Course Summary
Track 2: Apache Kafka
In this track, the focus will be on Apache Kafka and Apache Spark. Apache Kafka is a popular event streaming platform used by Fortune 100 companies for both real-time and batch data processing. Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources.
Courses (7 hours +)
Processing Data: Getting Started with Apache Kafka
Course: 1 Hour, 32 Minutes
- Course Overview
- Work with Real-time Data
- Stream Events with Kafka
- Kafka Topics
- Downloading and Installing Kafka
- Creating a Topic, Producer, and Consumer with Kafka
- Working with Multiple Kafka Topics
- Configuring a Multi-node Kafka Cluster
- Monitoring a Kafka Cluster
- Using Partitions and Replicas with Kafka
- Course Summary
Processing Data: Integrating Kafka with Python & Using Consumer Groups
Course: 1 Hour, 24 Minutes
- Course Overview
- Developing Kafka Producers and Consumers in Python
- Processing Messages at the Consumer
- Tweaking Kafka Broker Configurations
- Defining Automatic Topic Creation in Kafka
- Generating Fake Data for Kafka Consumption
- Setting a Destination Partition in Kafka
- Kafka Consumer Groups
- Creating and Using Consumer Groups in Kafka
- Working with Consumer Groups and Partitions in Kafka
- Kafka Configuration
- Course Summary
Processing Data: Introducing Apache Spark
Course: 1 Hour, 44 Minutes
- Course Overview
- Apache Spark
- Apache Spark Architecture
- Structured Streaming in Apache Spark
- Downloading and Installing Spark
- Deploying a Spark Cluster
- Launching a Spark Job
- Monitoring Spark Apps with the Web UI
- Configuring a Spark Cluster
- Building a Spark Streaming App
- Running Apps on a Standalone Cluster
- Running Apps on Spark Local
- Course Summary
Processing Data: Integrating Kafka with Apache Spark
Course: 1 Hour, 46 Minutes
- Course Overview
- Integrating Spark with Kafka
- Transforming Kafka Messages with PySpark
- Reading from Multiple Kafka Topics
- Setting up a Producer and Consumer with Kafka
- Publishing to Kafka from PySpark
- Transforming Data with Spark SQL
- Aggregations on Streaming Data
- Exploring Grouping and Ordering
- Defining Window Operations
- Creating Tumbling and Sliding Windows
- Course Summary
Processing Data: Using Kafka with Cassandra & Confluent
Course: 42 Minutes
- Course Overview
- Installing and Setting up Apache Cassandra
- Integrating Spark with Kafka and Cassandra
- Confluent and Kafka
- Setting up the Confluent Platform
- Working with Kafka Using Confluent
- Course Summary
Assessment: