Update Your Spark/Scala Development Environment in IntelliJ using Maven
A previous tutorial discussed the steps in setting up an original environment and running a simple Spark application from scratch. This tutorial will assist in updating that same environment with the current versions, which, as of this writing is: Spark 2.4.3 and Scala 2.11.12.
True there are later versions of Scala but Spark 2.4.3 is compatible with Scala 2.11.12.
It assumes you have IntelliJ and maven installed.
SPARK
Download Spark from https://spark.apache.org/downloads.html
1. Choose a Spark release: 2.4.3 May 07 2019
2. Choose a package type: Prebuilt for apache Hadoop 2.7 and later
3. Download Spark: spark-2.4.3-bin-hadoop2.7.tgz into /usr/local/spark
4. Verify this release using the checksum (compare to download site)
shasum -a 512 spark-2.4.3-bin-hadoop2.7.tgz
5. Untar and place contents into /usr/local/spark
SCALA
Visit https://www.scala-lang.org/download/2.11.12.html
1. Download: scala-2.11.12.tgz
2. Untar and place contents into /usr/local/share/scala
CONFIGURE PROFILE
Edit your profile
export SCALA_HOME=/usr/local/share/scala-2.11.12
export SCALA_VERSION=2-11.12
export SCALA_BINARY_VERSION=2-11
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SCALA_HOME/bin
export PATH=$PATH:$SPARK_HOME/bin
And source it
% source .myprofile
Test a simple app
Open IntelliJ
Choose: File > New > Project > Maven [Next]
Groupid: machinecreek
Artifact: SimpleApp [Next]
Project name: ~/Test [Finish]
In the Maven projects need to be imported dialog box select > Select Enable Auto Import
Right-click the src/main/java folder
Refactor > Rename: scala
Right-click the src/test/java folder
Refactor > Rename: scala
Open the pom.xml file and paste the following under the groupId, artifactId, and version:
<properties> <scala.version>2.11.12</scala.version> <scala.minor.version>2.11</scala.minor.version> <spark.version>2.4.3</spark.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.minor.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.minor.version}</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <pluginManagement> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.1</version> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.0.2</version> </plugin> </plugins> </pluginManagement> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <executions> <execution> <id>scala-compile-first</id> <phase>process-resources</phase> <goals> <goal>add-source</goal> <goal>compile</goal> </goals> </execution> <execution> <id>scala-test-compile</id> <phase>process-test-resources</phase> <goals> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <executions> <execution> <phase>compile</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
Add the application’s code
Right-Click on the src/main/scala folder > New > Package:
Name it: com.machinecreek
Right-Click src/main/scala/com.machinecreek Choose New > Scala Class
Name: SimpleApp.scala
Kind: Object [OK]
Open SimpleApp.scala and paste the following:
package com.machinecreek import org.apache.spark.rdd.RDD import org.apache.spark.sql.SparkSession object SimpleApp { def main(args: Array[String]) { println("Hello from top of SimpleApp") val spark = SparkSession .builder() .appName("SimpleApp") .master("local") .getOrCreate() println("Spark.version") println(spark.version) println("scala.util.Properties.releaseVersion") println(scala.util.Properties.releaseVersion) val rdd:RDD[Int] = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8)) println(rdd.count()) } println("Hello from Bottom of SimpleApp") }
Run the application from Intellij
Right-click SimpleApp > compile
Right-click SimpleApp > run
Review results
Run the spark application from the command line:
cd /test mvn clean package cd /test/target spark-submit --master local --class com.machinecreek.SimpleApp SimpleApp-1.0-SNAPSHOT.jar
Intellij CLEANUP
Assuming you have already been working with previous versions of Spark/Scala, you may find Intellij has gotten confused as to which versions it should be using. I found the Intellij can be refreshed by removing two files from your project directory.
First shut down Intellij.
Then delete the following:
myProject%rm -rf .idea
myProject%rm Test.iml
Restart Intellij
Choose Import>myproject