Maxtrain.com - info@maxtrain.com - 513-322-8888 - 866-595-6863
Spark for Developers | Using Spark for Big Data & Machine Learning
Description
Spark for Developers | Using Spark for Big Data & Machine Learning Introduction
“Spark for Developers: Using Spark for Big Data & Machine Learning” is specifically designed for experienced developers who aim to excel in enterprise-grade Spark programming. Engaging participants with significant components of Spark, this course empowers students to develop comprehensive data science solutions.
Apache Spark, pivotal in the Hadoop ecosystem, is a cluster computing framework that is widely utilized for Big Data applications.
Leveraging the Hadoop YARN and HDFS infrastructures, Spark delivers processing speeds significantly faster than traditional Map/Reduce, making it a powerful tool for a variety of in-memory computing tasks.
In this course, developers will appreciate Spark’s compatibility with several popular programming languages, including Java, Scala, Python, and R, as well as SQL-based interfaces. Furthermore, integration with advanced libraries such as Mahout and MLib for Machine Learning, and GraphX or Neo4J for complex data graph processing, showcases Spark’s versatility.
The framework’s compatibility with various NOSQL data stores, rule engines, and other critical enterprise components solidifies its role as a key player in the modern Big Data and Data Science landscape.
Spark for Developers | Using Spark for Big Data & Machine Learning Course Objectives
- Master Spark Programming: Participants will understand the fundamentals of Spark within the broader Hadoop ecosystem to perform scalable Big Data processing.
- Develop Comprehensive Solutions: You will learn to develop and optimize Big Data solutions using Spark’s advanced in-memory computing and extensive library support.
- Hands-On Application: Moreover, engage in practical exercises that reinforce your ability to implement robust data science and machine learning solutions with Spark.
- Explore Language and Library Integration: Gain proficiency in utilizing Spark alongside multiple programming languages and dive into its integration with essential machine learning and graph processing libraries.
- Prepare for Advanced Use Cases: Lastly, equip yourself with the skills necessary to apply Spark in diverse environments, including integration with NoSQL data stores and other enterprise components.
Prerequisites
- This foundational course is ideal for intermediate skilled, experienced developers and architects who have basic Python experience and seek to master advanced, modern development skills with Apache Spark in an enterprise data environment.
Audience
Spark for Developers is recommended for the following:
- This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment.
Spark for Developers | Using Spark for Big Data & Machine Learning Outline
Introduction
- Big data, Hadoop, Spark
- Spark concepts and architecture
- Spark components overview
- Labs: installing and running Spark
The first look at Spark for Developers
- Spark shell
- Spark web UIs
- Analyzing dataset – part 1
- Labs: Spark shell exploration
Data structures with Spark
- Partitions
- Distributed execution
- Operations: transformations and actions
- Labs: Unstructured data analytics using RDDs
Caching
- Caching overview
- Various caching mechanisms available in Spark
- In memory file systems
- Caching use cases and best practices
- Labs: Benchmark of caching performance
DataFrames and Datasets
- DataFrames Intro
- Loading structured data (JSON, CSV) using DataFrames
- Using schema
- Specifying schema for DataFrames
- Labs: DataFrames, Datasets, Schema
Spark SQL
- Spark SQL concepts and overview
- Defining tables and importing datasets
- Querying data using SQL
- Handling various storage formats: JSON, Parquet, ORC
- Labs: querying structured data using SQL; evaluating data formats
Hadoop and Spark for Developers
- Hadoop Primer: HDFS, YARN
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
- Spark & Hive
API
- Overview of Spark APIs in Scala / Python
- The lifecycle of a Spark application
- Spark APIs
- Deploying Spark applications on YARN
- Labs: Developing and deploying a Spark application
ML and Spark Overview
- Machine Learning primer
- Machine Learning in Spark: MLib / ML
- Spark ML overview (newer Spark2 version)
- Algorithms overview: Clustering, Classifications, Recommendations
- Labs: Writing ML applications in Spark
GraphX
- GraphX library overview
- GraphX APIs
- Create a Graph and navigating it
- Shortest distance
- Pregel API
- Labs: Processing graph data using Spark
Time Permitting Topics
Streaming Spark
- Streaming concepts
- Evaluating Streaming platforms
- Spark streaming library overview
- Streaming operations
- Sliding window operations
- Structured Streaming
- Continuous streaming
- Spark & Kafka streaming
- Labs: Writing spark streaming applications
Workshop
- Attendees will work on solving real-world data analysis problems using Spark
$2195.00
|
3 Days Course |