Maxtrain.com - info@maxtrain.com - 513-322-8888 - 866-595-6863

TTSK7503

Spark for Developers | Using Spark for Big Data & Machine Learning

Description

Description

Spark for Developers | Using Spark for Big Data & Machine Learning Introduction

“Spark for Developers: Using Spark for Big Data & Machine Learning” is specifically designed for experienced developers who aim to excel in enterprise-grade Spark programming. Engaging participants with significant components of Spark, this course empowers students to develop comprehensive data science solutions.

Apache Spark, pivotal in the Hadoop ecosystem, is a cluster computing framework that is widely utilized for Big Data applications.

Leveraging the Hadoop YARN and HDFS infrastructures, Spark delivers processing speeds significantly faster than traditional Map/Reduce, making it a powerful tool for a variety of in-memory computing tasks.

In this course, developers will appreciate Spark’s compatibility with several popular programming languages, including Java, Scala, Python, and R, as well as SQL-based interfaces. Furthermore, integration with advanced libraries such as Mahout and MLib for Machine Learning, and GraphX or Neo4J for complex data graph processing, showcases Spark’s versatility.

The framework’s compatibility with various NOSQL data stores, rule engines, and other critical enterprise components solidifies its role as a key player in the modern Big Data and Data Science landscape.

Spark for Developers | Using Spark for Big Data & Machine Learning Course Objectives

Master Spark Programming: Participants will understand the fundamentals of Spark within the broader Hadoop ecosystem to perform scalable Big Data processing.
Develop Comprehensive Solutions: You will learn to develop and optimize Big Data solutions using Spark’s advanced in-memory computing and extensive library support.
Hands-On Application: Moreover, engage in practical exercises that reinforce your ability to implement robust data science and machine learning solutions with Spark.
Explore Language and Library Integration: Gain proficiency in utilizing Spark alongside multiple programming languages and dive into its integration with essential machine learning and graph processing libraries.
Prepare for Advanced Use Cases: Lastly, equip yourself with the skills necessary to apply Spark in diverse environments, including integration with NoSQL data stores and other enterprise components.

Prerequisites

This foundational course is ideal for intermediate skilled, experienced developers and architects who have basic Python experience and seek to master advanced, modern development skills with Apache Spark in an enterprise data environment.

Audience

Spark for Developers is recommended for the following:

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment.

Spark for Developers | Using Spark for Big Data & Machine Learning Outline

Introduction

Big data, Hadoop, Spark
Spark concepts and architecture
Spark components overview
Labs: installing and running Spark

The first look at Spark for Developers

Spark shell
Spark web UIs
Analyzing dataset – part 1
Labs: Spark shell exploration

Data structures with Spark

Partitions
Distributed execution
Operations: transformations and actions
Labs: Unstructured data analytics using RDDs

Caching

Caching overview
Various caching mechanisms available in Spark
In memory file systems
Caching use cases and best practices
Labs: Benchmark of caching performance

DataFrames and Datasets

DataFrames Intro
Loading structured data (JSON, CSV) using DataFrames
Using schema
Specifying schema for DataFrames
Labs: DataFrames, Datasets, Schema

Spark SQL

Spark SQL concepts and overview
Defining tables and importing datasets
Querying data using SQL
Handling various storage formats: JSON, Parquet, ORC
Labs: querying structured data using SQL; evaluating data formats

Hadoop and Spark for Developers

Hadoop Primer: HDFS, YARN
Hadoop + Spark architecture
Running Spark on Hadoop YARN
Processing HDFS files using Spark
Spark & Hive

API

Overview of Spark APIs in Scala / Python
The lifecycle of a Spark application
Spark APIs
Deploying Spark applications on YARN
Labs: Developing and deploying a Spark application

ML and Spark Overview

Machine Learning primer
Machine Learning in Spark: MLib / ML
Spark ML overview (newer Spark2 version)
Algorithms overview: Clustering, Classifications, Recommendations
Labs: Writing ML applications in Spark

GraphX

GraphX library overview
GraphX APIs
Create a Graph and navigating it
Shortest distance
Pregel API
Labs: Processing graph data using Spark

Time Permitting Topics

Streaming Spark

Streaming concepts
Evaluating Streaming platforms
Spark streaming library overview
Streaming operations
Sliding window operations
Structured Streaming
Continuous streaming
Spark & Kafka streaming
Labs: Writing spark streaming applications

Workshop

Attendees will work on solving real-world data analysis problems using Spark

$2195.00

3 Days Course

Group Training Available

Ohio TechCred

Category: Data Analysis & Business Intelligence

Spark for Developers | Using Spark for Big Data & Machine Learning

Description

Description

Spark for Developers | Using Spark for Big Data & Machine Learning Introduction

Spark for Developers | Using Spark for Big Data & Machine Learning Course Objectives

Prerequisites

Audience

Spark for Developers | Using Spark for Big Data & Machine Learning Outline

Introduction

The first look at Spark for Developers

Data structures with Spark

Caching

DataFrames and Datasets

Spark SQL

Hadoop and Spark for Developers

API

ML and Spark Overview

GraphX

Streaming Spark

Workshop

Class Dates