Home > Net >  java.lang.NoSuchMethodError: com.mongodb.internal.operation.SyncOperations.aggregate
java.lang.NoSuchMethodError: com.mongodb.internal.operation.SyncOperations.aggregate

Time:06-01

i am trying to build an application as a proof of concept that would use spark in scala and tap into mongodb. Thus far i am able to connect to spark, to connect to mongodb separately. But when i try to connect spark and mongodb together, i get the following error :

Exception in thread "main" java.lang.NoSuchMethodError: com.mongodb.internal.operation.SyncOperations.aggregate(Ljava/util/List;Ljava/lang/Class;JJLjava/lang/Integer;Lcom/mongodb/client/model/Collation;Lorg/bson/conversions/Bson;Ljava/lang/String;Ljava/lang/String;Lorg/bson/conversions/Bson;Ljava/lang/Boolean;Lcom/mongodb/internal/client/model/AggregationLevel;)Lcom/mongodb/internal/operation/ExplainableReadOperation;
    at com.mongodb.client.internal.AggregateIterableImpl.asAggregateOperation(AggregateIterableImpl.java:213)
    at com.mongodb.client.internal.AggregateIterableImpl.asReadOperation(AggregateIterableImpl.java:208)
    at com.mongodb.client.internal.MongoIterableImpl.execute(MongoIterableImpl.java:135)
    at com.mongodb.client.internal.MongoIterableImpl.iterator(MongoIterableImpl.java:92)
    at com.mongodb.client.internal.MongoIterableImpl.forEach(MongoIterableImpl.java:121)
    at com.mongodb.client.internal.MongoIterableImpl.into(MongoIterableImpl.java:130)
    at com.mongodb.spark.sql.connector.schema.InferSchema.lambda$inferSchema$0(InferSchema.java:81)
    at com.mongodb.spark.sql.connector.config.AbstractMongoConfig.withCollection(AbstractMongoConfig.java:170)
    at com.mongodb.spark.sql.connector.config.ReadConfig.withCollection(ReadConfig.java:45)
    at com.mongodb.spark.sql.connector.schema.InferSchema.inferSchema(InferSchema.java:81)
    at com.mongodb.spark.sql.connector.MongoTableProvider.inferSchema(MongoTableProvider.java:62)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:295)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:265)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
    at WordCountCucumberScala_GhislainGripon.SparkMapReduceTask.execute(SparkMapReduceTask.scala:24)
    at WordCountCucumberScala_GhislainGripon.MapReduce$.main(MapReduce.scala:12)
    at WordCountCucumberScala_GhislainGripon.MapReduce.main(MapReduce.scala)

This only happens when i try to tap into Mongodb. I am using Spark 3.1.3, Mongodb 5.0.7 along with mongospark connector 10.0.1 and scala 2.12.15, java 1.8 all of that on IntelliJ IDEA.

package WordCountCucumberScala_GhislainGripon

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.SparkConf
import com.mongodb.spark._

class SparkMapReduceTask(config: Configuration) extends Task{
  override def execute(): Unit = {

    val connectionString = s"mongodb://${config.getUsername}:${config.getPassword}@${config.host}:${config.port}/?authSource=${config.database}"

    val sparkConf = new SparkConf()
      .set("spark.mongodb.read.connection.uri", connectionString)
      .set("spark.mongodb.write.connection.uri", connectionString)
      .setMaster("local")
      .setAppName("MapReduce")

    val spark = SparkSession.builder()
      .config(sparkConf)
      .getOrCreate()

    import spark.implicits._
    //spark.sparkContext.setLogLevel("WARN")
    //val textRDD = spark.read.text(config.data_dir   "/"   config.main_data   ".txt")
    //textRDD.flatMap(line => "(((?U)\\w) )".r.findAllIn(line.mkString).toList)
      //.groupBy($"value").count().orderBy($"count".desc).write.mode("overwrite").csv(config.data_dir   "/MapReduceResults")

    val testRDD = spark.read.format("mongodb").option("database", config.database).option("collection", config.text_table).load()
    testRDD.show()

  }

}

I use a configuration class to get parameters from a config file and use its data to enter the mongodb connection informations. I tried tinkering with the versions of librairies of spark, mongodb and the connector but to no avail.

Here is the pom.xml


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>wewyse</groupId>
  <artifactId>WordCountCucumberScala_GhislainGripon</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderful scala app</description>
  <inceptionYear>2018</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

  <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.12.15</scala.version>
    <scala.compat.version>2.12</scala.compat.version>
    <spec2.version>4.15.0</spec2.version>
    <spark.version>3.1.3</spark.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.13.2</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scalatestplus</groupId>
      <artifactId>junit-4-13_${scala.compat.version}</artifactId>
      <version>3.2.12.0</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb.scala</groupId>
      <artifactId>mongo-scala-driver_${scala.compat.version}</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb.spark</groupId>
      <artifactId>mongo-spark-connector</artifactId>
      <version>10.0.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb.scala</groupId>
      <artifactId>mongo-scala-bson_${scala.compat.version}</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>bson</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-reactivestreams</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-core</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>bson-record-codec</artifactId>
      <version>4.6.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.compat.version}</artifactId>
      <version>3.2.12</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-core_${scala.compat.version}</artifactId>
      <version>${spec2.version}</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-junit_${scala.compat.version}</artifactId>
      <version>${spec2.version}</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>io.cucumber</groupId>
      <artifactId>cucumber-scala_${scala.compat.version}</artifactId>
      <version>8.2.6</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>io.cucumber</groupId>
      <artifactId>cucumber-junit-platform-engine</artifactId>
      <version>7.3.3</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>io.cucumber</groupId>
      <artifactId>cucumber-junit</artifactId>
      <version>7.3.3</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>io.cucumber</groupId>
      <artifactId>cucumber-core</artifactId>
      <version>7.3.3</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>io.circe</groupId>
      <artifactId>circe-yaml_${scala.compat.version}</artifactId>
      <version>0.14.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>io.circe</groupId>
      <artifactId>circe-core_${scala.compat.version}</artifactId>
      <version>0.14.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>io.circe</groupId>
      <artifactId>circe-parser_${scala.compat.version}</artifactId>
      <version>0.14.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>io.circe</groupId>
      <artifactId>circe-generic-extras_${scala.compat.version}</artifactId>
      <version>0.14.1</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${scala.compat.version}</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_${scala.compat.version}</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_${scala.compat.version}</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-catalyst_${scala.compat.version}</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-reflect</artifactId>
      <version>2.12.15</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <!-- see http://davidb.github.com/scala-maven-plugin -->
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
            <configuration>
              <args>
                <arg>-dependencyfile</arg>
                <arg>${project.build.directory}/.scala_dependencies</arg>
              </args>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.21.0</version>
        <configuration>
          <!-- Tests will be run with scalatest-maven-plugin instead -->
          <skipTests>true</skipTests>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest-maven-plugin</artifactId>
        <version>2.0.0</version>
        <configuration>
          <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
          <junitxml>.</junitxml>
          <filereports>TestSuiteReport.txt</filereports>
        </configuration>
        <executions>
          <execution>
            <id>test</id>
            <goals>
              <goal>test</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

I looked around for answers around this issue already and saw that the spark version i used to use could be too recent and not supported by the connector so i downgraded it to 3.1.3 from 3.2.1. This however did not make the error vanish, i don't understand the source of the problem. I checked for compatibilty, mongo-spark-connector 10.0.1 is built for spark 3.1.x and mongodb 4.0 or later, though perhaps this means only 4.0.x ?

I can access mongodb through scala using the URI i showed above and i followed the instructions given on the mongodb website for version 10.0.1

CodePudding user response:

After downgrading Spark to 3.0.3, Mongodb to 4.0 and removing any and all mongodb librairies from my pom.xml except for the mongo-spark-connector, it now works most of the times even though sometimes it throws an error for a code that otherwise functions.

The error is likely due to function signature duplication, the connector uses the mongo java driver under the hood which is quite close to the scala one, function signatures are duplicated thus the compiler doesn't know which to use and fails.

  • Related