Install Java 8 on MOJ Data Engineering Macbook laptop guide
Apache Spark is a high-performance cluster computing system designed for big data processing and is written in Scala. Spark utilizes the Java Virtual Machine (JVM) and hence in order to use Spark an appropriate JDK (Java Development Kit) needs to be installed as a prerequisite. The version needed for Spark is JDK 8. Latest JDK versions are not always compatible with Spark so its safer to use what is known to work and is supported officialy. In case Scala development is needed (for some User Defined Functions) the appropriate Scala versions are also included in the following table.
JDK Version | Spark Version | Scala Version |
---|---|---|
8 | 2.4.x | 2.11.x |
8 | 3.x | 2.12.x |
Please note that these are the officially supported versions, other combinations may work but it's not guaranteed.
JDK 8 ? How can I install this on my macbook?
Install sdkman
An easy way to install a Java 8
JDK on your Macbook is to use sdkman
open a new terminal and run
curl -s "https://get.sdkman.io" | bash
and follow the instructions on screen
Then close the terminal. Open a new one and run :
source "$HOME/.sdkman/bin/sdkman-init.sh"
If you use zsh (you probably do on current Macs) ensure that these lines are the last lines of your .zshrc file.
#THIS MUST BE AT THE END OF THE FILE FOR SDKMAN TO WORK!!!
export SDKMAN_DIR="$HOME/.sdkman"
[[ -s "$HOME/.sdkman/bin/sdkman-init.sh" ]] && source "$HOME/.sdkman/bin/sdkman-init.sh"
Please ensure that the code in the last line above is all in the same line. Also check that what you added are the last lines on your .zshrc file.
You can then run sdk version
on the terminal in order to check that your installation of sdkman was successful
Install appropriate Java version via sdkman
So as discussed above it is important to install a version of Java JDK that is compatible with Spark
Now that sdkman
is installed, we can install an appropriate version by entering in the terminal:
sdk install java 8.0.352-zulu
This version has the advantage of being suitable on M1 ARM Mac machines but also Intel based ones. Additionaly this is the version that is used for the integration testing scripts in Github Actions.
Run java -version
to check its installed properly.
Voila. You have Java 8 JDK now setupon your machine. You will now be able to use Pyspark (and any other JVM based software)
AWS Glue versions
These are the versions that Amazon uses in its AWS Glue configurations. It is recommended that any testing is done on the same versions in order to avoid suprises where something works testing locally but not in AWS Glue.
Dependency | Version in AWS Glue 1.0 | Version in AWS Glue 2.0 | Version in AWS Glue 3.0 | Version in AWS Glue 4.0 |
---|---|---|---|---|
Spark | 2.4.3 | 2.4.3 | 3.1.1 | 3.3.0 |
Hadoop | 2.8.5 | 2.8.5 | 3.2.1 | 3.2.1 |
Python | Python 3.6 | Python 3.7 | Python 3.7 | Python 3.10 |
Created: July 8, 2024