Classbaze

Disclosure: when you buy through links on our site, we may earn an affiliate commission.

Spark SQL and PySpark 3 using Python 3 (Formerly CCA175)

A Comprehensive Course on Spark SQL as well as Data Frame APIs using PySpark 2 and 3 leveraging Python 3.
4.3
4.3/5
(1,845 reviews)
11,644 students
Created by

9.7

Classbaze Grade®

10.0

Freshness

8.6

Popularity

9.8

Material

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language
Platform: Udemy
Video: 28h 30m
Language: English
Next start: On Demand

Best Hadoop classes:

Classbaze Rating

Classbaze Grade®

9.7 / 10

CourseMarks Score® helps students to find the best classes. We aggregate 18 factors, including freshness, student feedback and content diversity.

Freshness

10.0 / 10
This course was last updated on 6/2022.

Course content can become outdated quite quickly. After analysing 71,530 courses, we found that the highest rated courses are updated every year. If a course has not been updated for more than 2 years, you should carefully evaluate the course before enrolling.

Popularity

8.6 / 10
We analyzed factors such as the rating (4.3/5) and the ratio between the number of reviews and the number of students, which is a great signal of student commitment.

New courses are hard to evaluate because there are no or just a few student ratings, but Student Feedback Score helps you find great courses even with fewer reviews.

Material

9.8 / 10
Video Score: 10.0 / 10
The course includes 28h 30m video content. Courses with more videos usually have a higher average rating. We have found that the sweet spot is 16 hours of video, which is long enough to teach a topic comprehensively, but not overwhelming. Courses over 16 hours of video gets the maximum score.
The average video length is 7 hours 35 minutes of 69 Hadoop courses on Udemy.
Detail Score: 10.0 / 10

The top online course contains a detailed description of the course, what you will learn and also a detailed description about the instructor.

Extra Content Score: 9.5 / 10

Tests, exercises, articles and other resources help students to better understand and deepen their understanding of the topic.

This course contains:

0 article.
2 resources.
0 exercise.
0 test.

In this page

About the course

As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Python as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation for the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Apache Spark 2 and 3 using Python 3 as it covers industry-relevant topics beyond the scope of certification.
About Data Engineering
Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.
I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.
Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself. We will provide details about Resources or Environments to learn Spark SQL and PySpark 3 using Python 3 as well as Reference Material on GitHub to practice Spark SQL and PySpark 3 using Python 3. Keep in mind that you can either use the cluster at your workplace or set up the environment using provided instructions or use ITVersity Lab to take this course.
Setup of Single Node Big Data Cluster
Many of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don’t worry if you do not have the cluster handy, we will guide you through with support via Udemy Q&A.
•Setup Ubuntu-based AWS Cloud9 Instance with the right configuration
•Ensure Docker is setup
•Setup Jupyter Lab and other key components
•Setup and Validate Hadoop, Hive, YARN, and Spark
A quick recap of Python
This course requires a decent knowledge of Python. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Python. If you are not familiar with Python, then we suggest you go through our other course Data Engineering Essentials – Python, SQL, and Spark.
Master required Hadoop Skills to build Data Engineering Applications
As part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as Programming Language.
•Overview of HDFS Commands
•Copy Files into HDFS using put or copyFromLocal command using appropriate HDFS Commands
•Review whether the files are copied properly or not to HDFS using HDFS Commands.
•Get the size of the files using HDFS commands such as du, df, etc.
•Some fundamental concepts related to HDFS such as block size, replication factor, etc.
Data Engineering using Spark SQL
Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.
•Getting Started with Spark SQL
•Basic Transformations using Spark SQL
•Managing Tables – Basic DDL and DML in Spark SQL
•Managing Tables – DML and Create Partitioned Tables using Spark SQL
•Overview of Spark SQL Functions to manipulate strings, dates, null values, etc
•Windowing Functions using Spark SQL for ranking, advanced aggregations, etc.
Data Engineering using Spark Data Frame APIs
Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.
•Data Processing Overview using Spark or Pyspark Data Frame APIs.
•Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.
•Processing Column Data using Spark or Pyspark Data Frame APIs – You will be learning functions to manipulate strings, dates, null values, etc.
•Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.
•Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.
•Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic Functions
•Spark Metastore Databases and Tables and integration between Spark SQL and Data Frame APIs
Apache Spark Application Development and Deployment Life Cycle
Once you go through the content related to Spark using Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.
•Setup Python Virtual Environment and Project for Spark Application Development using Pycharm
•Understand complete Spark Application Development Lifecycle using Pycharm and Python
•Build zip file for the Spark Application, copy to the environment where it is supposed to run and run.
•Understand how to review the Spark Application Execution Life Cycle.
All the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

What can you learn from this course?

✓ Setup the Single Node Hadoop and Spark using Docker locally or on AWS Cloud9
✓ Review ITVersity Labs (exclusively for ITVersity Lab Customers)
✓ All the HDFS Commands that are relevant to validate files and folders in HDFS.
✓ Quick recap of Python which is relevant to learn Spark
✓ Ability to use Spark SQL to solve the problems using SQL style syntax.
✓ Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
✓ Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL.
✓ Apache Spark Application Development Life Cycle
✓ Apache Spark Application Execution Life Cycle and Spark UI
✓ Setup SSH Proxy to access Spark Application logs
✓ Deployment Modes of Spark Applications (Cluster and Client)
✓ Passing Application Properties Files and External Dependencies while running Spark Applications

What you need to start the course?

• Basic programming skills using any programming language
• Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
• Minimum memory required based on the environment you are using with 64 bit operating system
• 4 GB RAM with access to proper clusters or 16 GB RAM to setup environment using Docker

Who is this course is made for?

• Any IT aspirant/professional willing to learn Data Engineering using Apache Spark
• Python Developers who want to learn Spark to add the key skill to be a Data Engineer
• Scala based Data Engineers who would like to learn Spark using Python as Programming Language

Are there coupons or discounts for Spark SQL and PySpark 3 using Python 3 (Formerly CCA175) ? What is the current price?

The course costs $17.99. And currently there is a 28% discount on the original price of the course, which was $24.99. So you save $7 if you enroll the course now.
The average price is $19.2 of 69 Hadoop courses. So this course is 6% cheaper than the average Hadoop course on Udemy.

Will I be refunded if I'm not satisfied with the Spark SQL and PySpark 3 using Python 3 (Formerly CCA175) course?

YES, Spark SQL and PySpark 3 using Python 3 (Formerly CCA175) has a 30-day money back guarantee. The 30-day refund policy is designed to allow students to study without risk.

Are there any financial aid for this course?

Currently we could not find a scholarship for the Spark SQL and PySpark 3 using Python 3 (Formerly CCA175) course, but there is a $7 discount from the original price ($24.99). So the current price is just $17.99.

Who will teach this course? Can I trust Durga Viswanatha Raju Gadiraju?

Durga Viswanatha Raju Gadiraju has created 15 courses that got 9,798 reviews which are generally positive. Durga Viswanatha Raju Gadiraju has taught 235,985 students and received a 4.4 average review out of 9,798 reviews. Depending on the information available, we think that Durga Viswanatha Raju Gadiraju is an instructor that you can trust.
CEO at ITVersity and CTO at Analytiqs, Inc
20+ years of experience in executing complex projects using a vast array of technologies including Big Data and the Cloud.
ITVersity, Inc. – is a US-based organization that provides quality training for IT professionals and we have a track record of training hundreds of thousands of professionals globally.
Building an IT career for people with required tools such as high-quality material, labs, live support, etc to upskill and cross-skill is paramount for our organization.
At this time our training offerings are focused on the following areas:
* Application Development using Python and SQL
* Big Data and Business Intelligence
* Cloud
* Datawarehousing, Databases
Browse all courses by on Classbaze.

9.7

Classbaze Grade®

10.0

Freshness

8.6

Popularity

9.8

Material

Platform: Udemy
Video: 28h 30m
Language: English
Next start: On Demand

Classbaze recommendations for you