Update Spark & Scala Development Environment with Intellij and Maven

Update Your Spark/Scala Development Environment in IntelliJ using Maven

A previous tutorial discussed the steps in setting up an original environment and running a simple Spark application from scratch. This tutorial will assist in updating that same environment with the current versions, which, as of this writing is: Spark 2.4.3 and Scala 2.11.12.

True there are later versions of Scala but Spark 2.4.3 is compatible with Scala 2.11.12.

It assumes you have IntelliJ and maven installed.
 

SPARK

Download Spark from https://spark.apache.org/downloads.html

1. Choose a Spark release: 2.4.3 May 07 2019

2. Choose a package type: Prebuilt for apache Hadoop 2.7 and later

3. Download Spark: spark-2.4.3-bin-hadoop2.7.tgz into /usr/local/spark

4. Verify this release using the checksum (compare to download site)

shasum -a 512 spark-2.4.3-bin-hadoop2.7.tgz

5. Untar and place contents into /usr/local/spark

 

SCALA

Visit https://www.scala-lang.org/download/2.11.12.html

1. Download: scala-2.11.12.tgz

2. Untar and place contents into /usr/local/share/scala

 

CONFIGURE PROFILE

Edit your profile

export SCALA_HOME=/usr/local/share/scala-2.11.12

export SCALA_VERSION=2-11.12

export SCALA_BINARY_VERSION=2-11

export SPARK_HOME=/usr/local/spark

export PATH=$PATH:$SCALA_HOME/bin

export PATH=$PATH:$SPARK_HOME/bin

And source it

% source .myprofile

 

Test a simple app

Open IntelliJ

Choose: File > New > Project > Maven [Next]

Groupid: machinecreek

Artifact: SimpleApp [Next]

Project name: ~/Test [Finish]

In the Maven projects need to be imported dialog box select > Select Enable Auto Import

Right-click the src/main/java folder

Refactor > Rename: scala

Right-click the src/test/java folder

Refactor > Rename: scala

Open the pom.xml file and paste the following under the groupId, artifactId, and version:

     <properties>
        <scala.version>2.11.12</scala.version>
        <scala.minor.version>2.11</scala.minor.version>
        <spark.version>2.4.3</spark.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.minor.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.minor.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

    </dependencies>
    <build>
        <pluginManagement>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.1</version>
                </plugin>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>2.0.2</version>
                </plugin>
            </plugins>
        </pluginManagement>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

 

Add the application’s code

Right-Click on the src/main/scala folder > New > Package:

Name it: com.machinecreek

Right-Click src/main/scala/com.machinecreek Choose New > Scala Class

Name: SimpleApp.scala

Kind: Object [OK]

Open SimpleApp.scala and paste the following:

package com.machinecreek

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession

object SimpleApp {

    def main(args: Array[String]) {
      println("Hello from top of SimpleApp")

      val spark = SparkSession
        .builder()
        .appName("SimpleApp")
        .master("local")
        .getOrCreate()

      println("Spark.version")
      println(spark.version)

      println("scala.util.Properties.releaseVersion")
      println(scala.util.Properties.releaseVersion)

      val rdd:RDD[Int] = spark.sparkContext.parallelize(List(1,2,3,4,5,6,7,8))
      println(rdd.count())
    }
  println("Hello from Bottom of SimpleApp")
}

 

Run the application from Intellij

Right-click SimpleApp > compile

Right-click SimpleApp > run

Review results

 

Run the spark application from the command line:

cd /test
mvn clean package
cd /test/target
spark-submit --master local --class com.machinecreek.SimpleApp SimpleApp-1.0-SNAPSHOT.jar

 

Intellij CLEANUP

Assuming you have already been working with previous versions of Spark/Scala, you may find Intellij has gotten confused as to which versions it should be using. I found the Intellij can be refreshed by removing two files from your project directory.

First shut down Intellij.

Then delete the following:

myProject%rm -rf .idea

myProject%rm Test.iml

Restart Intellij

Choose Import>myproject