Today, in this post I will show you how you can install Apache Spark on Ubuntu 20.04. This will be a step-by-step tutorial on installing apache spark. Now let’s start the tutorial.
Create a Directory for Apache Spark
Okay, so the first step will be to create a new directory in the home directory where we will download and Install Apache Spark.
You can create a new directory manually by going to your File Manager>Home directory, then right click>New Folder, name the folder “spark” and click on create.
Or, you can create a new directory using the terminal by using the command given below –
mkdir -p spark
Install Java Development kit (JDK)
Now, to be able to run and use Apache Spark we will need Java JDK installed in our Ubuntu system. To check whether you have java installed on your Ubuntu you can run the following command in the terminal.
java --version
If it says “Command “java” not found” then it means that java is not installed. To install java you can use the commands given below.
sudo apt-get update
sudo apt-get install default-jdk -y
Close the terminal window and open a new terminal window, you can verify the installation of java by typing “java –version” in the terminal.
Download Apache Spark
You can download Apache Spark through a web browser or you can download it using the terminal. I’ll show you both ways, first, I’ll show you how you can download Apache Spark using the terminal.
Download Apache Spark using Terminal
First, open your terminal and change your working directory to the “spark” directory that you created in the first step.
cd spark
Then using wget we will be downloading Apache Spark to that directory. To check if you have wget installed type ”wget –version” if the command is not found then install wget using the command –
apt-get install wget
After installing wget, enter the following command to install the latest version of Apache Spark, which is currently 3.2.0.
wget https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz
Here we are installing Apache Spark version 3.2.0 with package type “Pre-built for Apache Hadoop 3.3 and later”.
You can get links for different versions from https://spark.apache.org/downloads.html.
Download Apache Spark using Browser
If you want to download Apache spark using the web browser then just go to this link here – https://spark.apache.org/downloads.html.
Here select the version of Apache Spark you want to install in the first dropdown, in the second dropdown select the package type, after you select the version and package type of Apache Spark, you will get the download link in the 3rd line.
Click on the download link, You will be redirected to another page with the download links for Apache Spark. Click on the top link to start the download.
You can also copy that link and download it from the terminal using wget.
After downloading Apache Spark, right-click the downloaded file and move it to the “spark” folder that you created in the home directory.
Install Apache Spark In Ubuntu
Now, the installation process of Apache Spark is very easy. The first step will be to untar the tar file that we downloaded. To do that, first, change your working directory to the “spark” directory if you haven’t already.
cd spark
After that enter the following command shown below that will untar the tar file. “tar -xvf filename” is the command to untar the tar file, the file name will be the name of the downloaded tar file that you want to untar.
tar -xvf spark-3.2.0-bin-hadoop3.2.tgz
After you untar the tar file, you will need to set up the environment variable path for Apache Spark. To do that first change your working directory to home directory using the command below –
cd ~
Now, after changing the directory, using the gedit editor edit the Bashrc file.
gedit ~/.bashrc
This command will open the bashrc file in the gedit text editor, scroll down to the bottom of the bashrc file, and in the end add these lines –
SPARK_HOME=/home/username/spark/spark-3.2.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Change “username” to your own username, the file name at the end will be the name of the tar file that you untar-ed. After that on the top right corner click on the save button and exit the editor.
Finish the installation by using the source command –
source ~/.bashrc
That’s it, the installation of Apache spark is now completed, close the current terminal window and open a new one.
Verify the Installation of Apache Spark
Now, the last step in the installation of Apache Spark will be to verify whether Spark was installed correctly or not. You can do that by entering the following command in the terminal –
spark-shell
If this command opens the Spark shell then it means that the Apache Spark installation was successful.
Access Spark context Web UI
You can access Spark Web UI by typing “spark-shell” in the terminal, which will start the Spark shell and also will give you the URL of Spark Context Web UI.
[lyte id=’glF50MdXSdk’ /]
And, that is how you can install Apache Spark on Ubuntu 20.04. If you are facing any problem with the installation then comment below or contact me on Twitter. Thank you.
FAQ –
How to Uninstall or Remove Apache Spark from Ubuntu?
You can remove or uninstall Apache Spark by deleting the directory where Apache Spark was installed. Open your File Manager and right-click on the Spark folder and click “Move to Trash”.
You can also remove the directory using the terminal by using this command –
rm -rf spark
How to Fix Apache Spark crashing on Ubuntu?
Whenever you are running Spark on your ubuntu, make sure that there is plenty of RAM available. Allocate at least 80% of your RAM to Apache Spark for its proper working.