Classbaze

Disclosure: when you buy through links on our site, we may earn an affiliate commission.

Spark SQL & Hadoop (For Data Science)

Learn HDFS commands, Hadoop, Spark SQL, SQL Queries, ETL & Data Analysis| Spark Hadoop Cluster VM | Fully Solved Qs
4.1
4.1/5
(36 reviews)
5,365 students
Created by

9.0

Classbaze Grade®

9.9

Freshness

7.2

Popularity

9.4

Material

Learn HDFS commands
Platform: Udemy
Video: 5h 41m
Language: English
Next start: On Demand

Best Apache Spark classes:

Classbaze Rating

Classbaze Grade®

9.0 / 10

CourseMarks Score® helps students to find the best classes. We aggregate 18 factors, including freshness, student feedback and content diversity.

Freshness

9.9 / 10
This course was last updated on 3/2022.

Course content can become outdated quite quickly. After analysing 71,530 courses, we found that the highest rated courses are updated every year. If a course has not been updated for more than 2 years, you should carefully evaluate the course before enrolling.

Popularity

7.2 / 10
We analyzed factors such as the rating (4.1/5) and the ratio between the number of reviews and the number of students, which is a great signal of student commitment.

New courses are hard to evaluate because there are no or just a few student ratings, but Student Feedback Score helps you find great courses even with fewer reviews.

Material

9.4 / 10
Video Score: 8.4 / 10
The course includes 5h 41m video content. Courses with more videos usually have a higher average rating. We have found that the sweet spot is 16 hours of video, which is long enough to teach a topic comprehensively, but not overwhelming. Courses over 16 hours of video gets the maximum score.
The average video length is 6 hours 47 minutes of 113 Apache Spark courses on Udemy.
Detail Score: 10.0 / 10

The top online course contains a detailed description of the course, what you will learn and also a detailed description about the instructor.

Extra Content Score: 9.9 / 10

Tests, exercises, articles and other resources help students to better understand and deepen their understanding of the topic.

This course contains:

6 articles.
12 resources.
0 exercise.
0 test.

In this page

About the course

Apache Spark is currently one of the most popular systems for processing big data.

Apache Hadoop continues to be used by many organizations that look to store data locally on premises. Hadoop allows these organisations to efficiently store big datasets ranging in size from gigabytes to petabytes.

As the number of vacancies for data science, big data analysis and data engineering roles continue to grow, so too will the demand for individuals that possess knowledge of Spark and Hadoop technologies to fill these vacancies.

This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.

This course will help those individuals that are looking to interactively analyse big data or to begin writing production applications to prepare data for further analysis using Spark SQL in a Hadoop environment.

The course is also well suited for university students and recent graduates that are keen to gain exposure to Spark & Hadoop or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.

This course has been designed to be concise and to provide students with a necessary and sufficient amount of theory, enough for them to be able to use Hadoop & Spark without getting bogged down in too much theory about older low-level APIs such as RDDs.

On solving the questions contained in this course students will begin to develop those skills & the confidence needed to handle real world scenarios that come their way in a production environment.

(a) There are just under 30 problems in this course. These cover hdfs commands, basic data engineering tasks and data analysis.
(b) Fully worked out solutions to all the problems.
(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.

•The VM contains a Spark Hadoop environment which allows students to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.
•All the datasets students will need for the problems are already loaded onto HDFS, so there is no need for students to do any extra work.
•The VM also has Apache Zeppelin installed. This is a notebook specific to Spark and is similar to Python’s Jupyter notebook.

This course will allow students to get hands-on experience working in a Spark Hadoop environment as they practice:

•Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.
•Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.
•Reading and writing files in a variety of file formats.
•Performing standard extract, transform, load (ETL) processes on data using the Spark API.
•Using metastore tables as an input source or an output sink for Spark applications.
•Applying the understanding of the fundamentals of querying datasets in Spark.
•Filtering data using Spark.
•Writing queries that calculate aggregate statistics.
•Joining disparate datasets using Spark.
•Producing ranked or sorted data.

What can you learn from this course?

✓ Students will get hands-on experience working in a Spark Hadoop environment that’s free and downloadable as part of this course.
✓ Students will have opportunities solve Data Engineering and Data Analysis Problems using Spark on a Hadoop cluster in the sandbox environment that comes as part
✓ Issuing HDFS commands.
✓ Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.
✓ Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.
✓ Reading and writing files in a variety of file formats.
✓ Performing standard extract, transform, load (ETL) processes on data using the Spark API.
✓ Using metastore tables as an input source or an output sink for Spark applications.
✓ Applying the understanding of the fundamentals of querying datasets in Spark.
✓ Filtering data using Spark.
✓ Writing queries that calculate aggregate statistics.
✓ Joining disparate datasets using Spark.
✓ Producing ranked or sorted data.

What you need to start the course?

• This course has been designed for individuals that are new to Hadoop and Spark, so the course does not assume any prior knowledge of Hadoop or Spark theory.
• A basic knowledge of SQL queries is helpful. But students with no prior knowledge of SQL are provided with a good enough introduction to SQL queries to ensure that they hit the ground running.
• The Verulam Blue VM, that comes as part of this course, has a Spark Hadoop environment and requires a pc or a laptop with a minimum of 8 GB RAM and 20 GB of free space (instructions on how to download and run the VM are provided).

Who is this course is made for?

• This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.
• This course is also well suited for university students and recent graduates that are keen to land a job with a company that’s looking to fill a big data-related positions or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.
• Software engineers & developers who are looking to break into the Data Engineering field will also find this course helpful.

Are there coupons or discounts for Spark SQL & Hadoop (For Data Science) ? What is the current price?

The course costs $14.99. And currently there is a 70% discount on the original price of the course, which was $49.99. So you save $35 if you enroll the course now.
The average price is $17.1 of 113 Apache Spark courses. So this course is 12% cheaper than the average Apache Spark course on Udemy.

Will I be refunded if I'm not satisfied with the Spark SQL & Hadoop (For Data Science) course?

YES, Spark SQL & Hadoop (For Data Science) has a 30-day money back guarantee. The 30-day refund policy is designed to allow students to study without risk.

Are there any financial aid for this course?

Currently we could not find a scholarship for the Spark SQL & Hadoop (For Data Science) course, but there is a $35 discount from the original price ($49.99). So the current price is just $14.99.

Who will teach this course? Can I trust Matthew Barr?

Matthew Barr has created 4 courses that got 133 reviews which are generally positive. Matthew Barr has taught 10,083 students and received a 4.7 average review out of 133 reviews. Depending on the information available, we think that Matthew Barr is an instructor that you can trust.
Data Scientist | Founder of Verulam Blue
Verulam Blue is a UK based start-up founded by Matthew Barr, a data scientist.
Matthew has worked on a number of projects, ranging from data cleansing through to working on developing prediction models for clinical use.
Matthew is now focused on running Verulam Blue, a start-up which aims to train & help prepare individuals pass big data-related certification exams.
Prior to transitioning to the world of Data Science Matthew worked for nearly a decade in the financial services sector as an actuarial analyst in London.
Matthew also holds two master’s degrees from University College London, one in data science the other in mathematics.

9.0

Classbaze Grade®

9.9

Freshness

7.2

Popularity

9.4

Material

Platform: Udemy
Video: 5h 41m
Language: English
Next start: On Demand

Classbaze recommendations for you