Using Deequ 1.1 with Spark 3
Ivan G
Posted on February 25, 2021
If you try to upgrade AWS Deequ to latest version (1.1.0) atm and use with Spark 3.0.1 you will get following error:
[error] (update) Conflicting cross-version suffixes in: org.apache.spark:spark-launcher, org.apache.spark:spark-sketch, org.apache.spark:spark-kvstore, org.json4s:json4s-ast, org.apache.spark:spark-catalyst, org.apache.spark:spark-network-shuffle, com.twitter:chill, org.apache.spark:spark-sql, org.scala-lang.modules:scala-xml, org.json4s:json4s-jackson, com.fasterxml.jackson.module:jackson-module-scala, org.json4s:json4s-core, org.apache.spark:spark-unsafe, org.json4s:json4s-scalap, org.scala-lang.modules:scala-parser-combinators, org.apache.spark:spark-tags, org.apache.spark:spark-core, org.apache.spark:spark-network-common
[error] Total time: 6 s, completed 10-Feb-2021 13:07:46
This is due to the fact that Deque has transitive dependencies to Scala 2.11 for some unknown reason (a bug?). You can fix that by using the following build.sbt
:
name := "dq"
scalaVersion := "2.12.12"
val sparkVersion = "3.0.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
// https://mvnrepository.com/artifact/com.amazon.deequ/deequ
// issue with Deequ transitive libs cross-compiled to Scala 2.11
libraryDependencies += ("com.amazon.deequ" % "deequ" % "1.1.0_spark-3.0-scala-2.12")
.exclude("org.scalanlp", "breeze_2.11")
.exclude("com.chuusai", "shapeless_2.11")
.exclude("org.apache.spark", "spark-core_2.11")
.exclude("org.apache.spark", "spark-sql_2.11")
P.S. Originally published on my blog.
💖 💪 🙅 🚩
Ivan G
Posted on February 25, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.