Using Deequ 1.1 with Spark 3

aloneguid

Ivan G

Posted on February 25, 2021

Using Deequ 1.1 with Spark 3

If you try to upgrade AWS Deequ to latest version (1.1.0) atm and use with Spark 3.0.1 you will get following error:

[error] (update) Conflicting cross-version suffixes in: org.apache.spark:spark-launcher, org.apache.spark:spark-sketch, org.apache.spark:spark-kvstore, org.json4s:json4s-ast, org.apache.spark:spark-catalyst, org.apache.spark:spark-network-shuffle, com.twitter:chill, org.apache.spark:spark-sql, org.scala-lang.modules:scala-xml, org.json4s:json4s-jackson, com.fasterxml.jackson.module:jackson-module-scala, org.json4s:json4s-core, org.apache.spark:spark-unsafe, org.json4s:json4s-scalap, org.scala-lang.modules:scala-parser-combinators, org.apache.spark:spark-tags, org.apache.spark:spark-core, org.apache.spark:spark-network-common
[error] Total time: 6 s, completed 10-Feb-2021 13:07:46
Enter fullscreen mode Exit fullscreen mode

This is due to the fact that Deque has transitive dependencies to Scala 2.11 for some unknown reason (a bug?). You can fix that by using the following build.sbt:

name := "dq"

scalaVersion := "2.12.12"

val sparkVersion = "3.0.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"


// https://mvnrepository.com/artifact/com.amazon.deequ/deequ
// issue with Deequ transitive libs cross-compiled to Scala 2.11
libraryDependencies += ("com.amazon.deequ" % "deequ" % "1.1.0_spark-3.0-scala-2.12")
  .exclude("org.scalanlp", "breeze_2.11")
  .exclude("com.chuusai", "shapeless_2.11")
  .exclude("org.apache.spark", "spark-core_2.11")
  .exclude("org.apache.spark", "spark-sql_2.11")
Enter fullscreen mode Exit fullscreen mode

P.S. Originally published on my blog.

💖 💪 🙅 🚩
aloneguid
Ivan G

Posted on February 25, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Using Deequ 1.1 with Spark 3
apachespark Using Deequ 1.1 with Spark 3

February 25, 2021