You’ll also want to rely on external libraries that have minimal dependencies themselves as the dependies of a library quickly become your dependencies as soon as you add the library to your project. You’ll want to be very careful to minimize your project dependencies. If we look at the net.gpedro pom.xml file, we can see that the net.gpedro relies on :
#HOW TO INSTALL SPARK FROM SOURCE CODE CODE#
But why does our fat JAR file include com/google/gson/ code as well? Sbt assembly provides us with the com/github/mrpowers/spark/slack, net/gpedro/, and org/json4s/ as expected.
$ jar tvf target/scala-2.11/spark-slack_2.11-2.3.0_0.0.1.jarĠ Wed May 02 21:09: com/github/mrpowers/spark/Ġ Wed May 02 21:09: com/github/mrpowers/spark/slack/Ġ Wed May 02 21:09: com/github/mrpowers/spark/slack/slash_commands/Ġ Wed May 02 21:09: com/google/gson/annotations/Ġ Wed May 02 21:09: com/google/gson/internal/Ġ Wed May 02 21:09: com/google/gson/internal/bind/Ġ Wed May 02 21:09: com/google/gson/reflect/Ġ Wed May 02 21:09: com/google/gson/stream/Ġ Wed May 02 21:09: com/thoughtworks/paranamer/Ġ Wed May 02 21:09: net/gpedro/integrations/Ġ Wed May 02 21:09: net/gpedro/integrations/slack/Ġ Wed May 02 21:09: org/json4s/scalap/scalasig/ġ879 Wed May 02 21:09: com/github/mrpowers/spark/slack/Notifier.classġ115 Wed May 02 21:09: com/github/mrpowers/spark/slack/SparkSessionWrapper$class.classĦ83 Wed May 02 21:09: com/github/mrpowers/spark/slack/SparkSessionWrapper.classĢ861 Wed May 02 21:09: com/github/mrpowers/spark/slack/slash_commands/SlashParser.class Let’s build the JAR file with sbt assembly and then inspect the content. Notice that sbt package and sbt assembly require different code to customize the JAR file name. jar" customizes the JAR file name that’s created by sbt assembly. TestFrameworks += new TestFramework("")ĪrtifactName :=. LibraryDependencies += "com.lihaoyi" %% "utest" % "0.6.3" % "test" This is an excerpt of the spark-daria build.sbt file: libraryDependencies += "" %% "spark-sql" % "2.3.0" % "provided" Spark-daria is a good example of an open source project that is distributed as a thin JAR file. Let’s dig into the gruesome details! Building a Thin JAR FileĪs discussed, the sbt package builds a thin JAR file of your project. If you run sbt assembly, SBT will build a fat JAR file that includes both your project files and the uJson files. The thin JAR file will not include the uJson files. If you run sbt package, SBT will build a thin JAR file that only includes your project files. libraryDependencies += "com.lihaoyi" %% "ujson" % "0.6.5" Let’s say you add the uJson library to your build.sbt file as a library dependency.
Fat JAR files inlude all the code from your project and all the code from the dependencies. You can build “fat” JAR files by adding sbt-assembly to your project. Thin JAR files only include the project’s classes / objects / traits and don’t include any of the project dependencies.
You can build a “thin” JAR file with the sbt package command. JAR files can be attached to Databricks clusters or launched via spark-submit. Hopefully it will help you make the leap and start writing Spark code in SBT projects with a powerful IDE by your side! JAR File BasicsĪ JAR (Java ARchive) is a package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file for distribution.
#HOW TO INSTALL SPARK FROM SOURCE CODE HOW TO#
This episode will demonstrate how to build JAR files with the SBT package and assembly commands and how to customize the code that’s included in JAR files. Scala is a difficult language and it’s especially challenging when you can’t leverage the development tools provided by an IDE like IntelliJ.
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster.Ī lot of developers develop Spark code in brower based notebooks because they’re unfamiliar with JAR files.