How to Install Apache Spark on Ubuntu 20.04

How to Install Apache Spark on Ubuntu 20.04

Create a Directory for Apache Spark

Okay, so the first step will be to create a new directory in the home directory where we will download and Install Apache Spark.

Advertisement

You can create a new directory manually by going to your File Manager>Home directory, then right click>New Folder, name the folder “spark” and click on create.

Or, you can create a new directory using the terminal by using the command given below – 

mkdir -p spark

Install Java Development kit (JDK)

Now, to be able to run and use Apache Spark we will need Java JDK installed in our Ubuntu system. To check whether you have java installed on your Ubuntu you can run the following command in the terminal.

Advertisement
java --version

If it says “Command “java” not found” then it means that java is not installed. To install java you can use the commands given below.

sudo apt-get update
sudo apt-get install default-jdk -y

Close the terminal window and open a new terminal window, you can verify the installation of java by typing “java –version” in the terminal.

Download Apache Spark

You can download Apache Spark through a web browser or you can download it using the terminal. I’ll show you both ways, first, I’ll show you how you can download Apache Spark using the terminal. 

Advertisement

Download Apache Spark using Terminal

First, open your terminal and change your working directory to the “spark” directory that you created in the first step.

cd spark

Then using wget we will be downloading Apache Spark to that directory. To check if you have wget installed type ”wget –version” if the command is not found then install wget using the command –

apt-get install wget

After installing wget, enter the following command to install the latest version of Apache Spark, which is currently 3.2.0.

Advertisement
wget https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz

Here we are installing Apache Spark version 3.2.0 with package type “Pre-built for Apache Hadoop 3.3 and later”.

You can get links for different versions from https://spark.apache.org/downloads.html.

Download Apache Spark using Browser

If you want to download Apache spark using the web browser then just go to this link here – https://spark.apache.org/downloads.html.

Advertisement

Here select the version of Apache Spark you want to install in the first dropdown, in the second dropdown select the package type, after you select the version and package type of Apache Spark, you will get the download link in the 3rd line.

select apache spark version to download

Click on the download link, You will be redirected to another page with the download links for Apache Spark. Click on the top link to start the download. 

downloading apache spark

You can also copy that link and download it from the terminal using wget.

Advertisement

After downloading Apache Spark, right-click the downloaded file and move it to the “spark” folder that you created in the home directory.

Install Apache Spark In Ubuntu

Now, the installation process of Apache Spark is very easy. The first step will be to untar the tar file that we downloaded. To do that, first, change your working directory to the “spark” directory if you haven’t already.

cd spark

After that enter the following command shown below that will untar the tar file. “tar -xvf filename” is the command to untar the tar file, the file name will be the name of the downloaded tar file that you want to untar.

Advertisement
tar -xvf spark-3.2.0-bin-hadoop3.2.tgz

After you untar the tar file, you will need to set up the environment variable path for Apache Spark. To do that first change your working directory to home directory using the command below –

cd ~

Now, after changing the directory, using the gedit editor edit the Bashrc file.

gedit ~/.bashrc

This command will open the bashrc file in the gedit text editor, scroll down to the bottom of the bashrc file, and in the end add these lines –

Advertisement
SPARK_HOME=/home/username/spark/spark-3.2.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Change “username” to your own username, the file name at the end will be the name of the tar file that you untar-ed. After that on the top right corner click on the save button and exit the editor.

Finish the installation by using the source command –

source ~/.bashrc

That’s it, the installation of Apache spark is now completed, close the current terminal window and open a new one.

Advertisement

Verify the Installation of Apache Spark

Now, the last step in the installation of Apache Spark will be to verify whether Spark was installed correctly or not. You can do that by entering the following command in the terminal –

spark-shell 

If this command opens the Spark shell then it means that the Apache Spark installation was successful.

Access Spark context Web UI

You can access Spark Web UI by typing “spark-shell” in the terminal, which will start the Spark shell and also will give you the URL of Spark Context Web UI.

Advertisement

And, that is how you can install Apache Spark on Ubuntu 20.04. If you are facing any problem with the installation then comment below or contact me on Twitter. Thank you.

FAQ -

How to Uninstall or Remove Apache Spark from Ubuntu?

You can remove or uninstall Apache Spark by deleting the directory where Apache Spark was installed. Open your File Manager and right-click on the Spark folder and click “Move to Trash”.

Advertisement

You can also remove the directory using the terminal by using this command –

rm -rf spark

How to Fix Apache Spark crashing on Ubuntu?

Whenever you are running Spark on your ubuntu, make sure that there is plenty of RAM available. Allocate at least 80% of your RAM to Apache Spark for its proper working.

Leave a Comment

Your email address will not be published. Required fields are marked *