Skip to content

Install Java 8 on MOJ Data Engineering Macbook laptop guide

Apache Spark is a high-performance cluster computing system designed for big data processing and is written in Scala. Spark utilizes the Java Virtual Machine (JVM) and hence in order to use Spark an appropriate JDK (Java Development Kit) needs to be installed as a prerequisite. The version needed for Spark is JDK 8. Latest JDK versions are not always compatible with Spark so its safer to use what is known to work and is supported officialy. In case Scala development is needed (for some User Defined Functions) the appropriate Scala versions are also included in the following table.

JDK Version Spark Version Scala Version
8 2.4.x 2.11.x
8 3.x 2.12.x

Please note that these are the officially supported versions, other combinations may work but it's not guaranteed.

JDK 8 ? How can I install this on my macbook?

Install sdkman

An easy way to install a Java 8 JDK on your Macbook is to use sdkman

open a new terminal and run

curl -s "https://get.sdkman.io" | bash and follow the instructions on screen

Then close the terminal. Open a new one and run :

source "$HOME/.sdkman/bin/sdkman-init.sh"

If you use zsh (you probably do on current Macs) ensure that these lines are the last lines of your .zshrc file.

#THIS MUST BE AT THE END OF THE FILE FOR SDKMAN TO WORK!!!
export SDKMAN_DIR="$HOME/.sdkman"
[[ -s "$HOME/.sdkman/bin/sdkman-init.sh" ]] && source "$HOME/.sdkman/bin/sdkman-init.sh"

Please ensure that the code in the last line above is all in the same line. Also check that what you added are the last lines on your .zshrc file.

You can then run sdk version on the terminal in order to check that your installation of sdkman was successful

Install appropriate Java version via sdkman

So as discussed above it is important to install a version of Java JDK that is compatible with Spark

Now that sdkman is installed, we can install an appropriate version by entering in the terminal:

sdk install java 8.0.352-zulu

This version has the advantage of being suitable on M1 ARM Mac machines but also Intel based ones. Additionaly this is the version that is used for the integration testing scripts in Github Actions.

Run java -version to check its installed properly.

Voila. You have Java 8 JDK now setupon your machine. You will now be able to use Pyspark (and any other JVM based software)

AWS Glue versions

These are the versions that Amazon uses in its AWS Glue configurations. It is recommended that any testing is done on the same versions in order to avoid suprises where something works testing locally but not in AWS Glue.

Dependency Version in AWS Glue 1.0 Version in AWS Glue 2.0 Version in AWS Glue 3.0 Version in AWS Glue 4.0
Spark 2.4.3 2.4.3 3.1.1 3.3.0
Hadoop 2.8.5 2.8.5 3.2.1 3.2.1
Python Python 3.6 Python 3.7 Python 3.7 Python 3.10

Last update: July 8, 2024
Created: July 8, 2024