I have a Apache Beam project which works fine if I directly run it. But if i try to create a jar using maven clean:package
it creates a uber jar using maven shade plugin.
After that I provide this uber jar to spark-submit and when I run the spark-submit command it throws Exception
Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim(Ljava/lang/String;)Ljava/lang/String;
at com.amazonaws.auth.profile.internal.AwsProfileNameLoader.getEnvProfileName(AwsProfileNameLoader.java:72)
at com.amazonaws.auth.profile.internal.AwsProfileNameLoader.loadProfileName(AwsProfileNameLoader.java:54)
at com.amazonaws.regions.AwsProfileRegionProvider.<init>(AwsProfileRegionProvider.java:40)
at com.amazonaws.regions.DefaultAwsRegionProviderChain.<init>(DefaultAwsRegionProviderChain.java:23)
at com.amazonaws.client.builder.AwsClientBuilder.<clinit>(AwsClientBuilder.java:60)
at org.apache.beam.sdk.io.aws.s3.DefaultS3ClientBuilderFactory.createBuilder(DefaultS3ClientBuilderFactory.java:39)
at org.apache.beam.sdk.io.aws.s3.S3FileSystemConfiguration.getBuilder(S3FileSystemConfiguration.java:102)
at org.apache.beam.sdk.io.aws.s3.S3FileSystemConfiguration.fromS3Options(S3FileSystemConfiguration.java:94)
at org.apache.beam.sdk.io.aws.s3.DefaultS3FileSystemSchemeRegistrar.fromOptions(DefaultS3FileSystemSchemeRegistrar.java:39)
at org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar.lambda$fromOptions$0(S3FileSystemRegistrar.java:52)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar.fromOptions(S3FileSystemRegistrar.java:55)
at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:550)
at org.apache.beam.sdk.io.FileSystems.setDefaultPipelineOptions(FileSystems.java:540)
at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:47)
at org.apache.beam.sdk.Pipeline.create(Pipeline.java:155)
at org.propellyr.Main.main(Main.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
the pom.xml file
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.propellyr</groupId>
<artifactId>beam-etl</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<beam.version>2.38.0</beam.version>
<bigquery.version>v2-rev20211129-1.32.1</bigquery.version>
<google-api-client.version>1.32.1</google-api-client.version>
<guava.version>31.0.1-jre</guava.version>
<hamcrest.version>2.1</hamcrest.version>
<jackson.version>2.13.0</jackson.version>
<joda.version>2.10.10</joda.version>
<libraries-bom.version>24.4.0</libraries-bom.version>
<maven-compiler-plugin.version>3.7.0</maven-compiler-plugin.version>
<maven-exec-plugin.version>1.6.0</maven-exec-plugin.version>
<maven-jar-plugin.version>3.0.2</maven-jar-plugin.version>
<maven-shade-plugin.version>3.1.0</maven-shade-plugin.version>
<mockito.version>3.7.7</mockito.version>
<pubsub.version>v1-rev20211130-1.32.1</pubsub.version>
<spark.version>2.4.6</spark.version>
<hadoop.version>2.10.1</hadoop.version>
<maven-surefire-plugin.version>3.0.0-M5</maven-surefire-plugin.version>
<nemo.version>0.1</nemo.version>
<flink.artifact.name>beam-runners-flink-1.14</flink.artifact.name>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${maven-surefire-plugin.version}</version>
<configuration>
<parallel>all</parallel>
<threadCount>4</threadCount>
<redirectTestOutputToFile>true</redirectTestOutputToFile>
</configuration>
</plugin>
<!-- Ensure that the Maven jar plugin runs before the Maven
shade plugin by listing the plugin higher within the file. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>${maven-jar-plugin.version}</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven-shade-plugin.version}</version>
<configuration>
<artifactSet>
<excludes>
<exclude>META-INF.versions.11.module-info</exclude>
</excludes>
</artifactSet>
<createDependencyReducedPom>false</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>module-info.class</exclude>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
<filter>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>shaded</shadedClassifierName>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>${maven-exec-plugin.version}</version>
<configuration>
<cleanupDaemonThreads>false</cleanupDaemonThreads>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
<profiles>
<profile>
<id>direct-runner</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<!-- Makes the DirectRunner available when running a pipeline. -->
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-direct-java</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<!-- <profile>-->
<!-- <id>flink-runner</id>-->
<!-- <!– Makes the FlinkRunner available when running a pipeline. –>-->
<!-- <dependencies>-->
<!-- <dependency>-->
<!-- <groupId>org.apache.beam</groupId>-->
<!-- <!– Please see the Flink Runner page for an up-to-date list-->
<!-- of supported Flink versions and their artifact names:-->
<!-- https://beam.apache.org/documentation/runners/flink/ –>-->
<!-- <artifactId>${flink.artifact.name}</artifactId>-->
<!-- <version>${beam.version}</version>-->
<!-- <scope>runtime</scope>-->
<!-- </dependency>-->
<!-- </dependencies>-->
<!-- </profile>-->
<profile>
<id>spark-runner</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<!-- Makes the SparkRunner available when running a pipeline. Additionally,
overrides some Spark dependencies to Beam-compatible versions. -->
<properties>
<netty.version>4.1.17.Final</netty.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-spark</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.13</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
</profile>
</profiles>
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>${beam.version}</version>
<!-- <exclusions>-->
<!-- <exclusion>-->
<!-- <groupId>org.apache.parquet</groupId>-->
<!-- <artifactId>parquet-column</artifactId>-->
<!-- </exclusion>-->
<!-- </exclusions>-->
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-direct-java</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-parquet</artifactId>
<version>2.38.0</version>
<exclusions>
<exclusion>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils-core</artifactId>
</exclusion>
<exclusion>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<!-- <exclusion>-->
<!-- <groupId>org.apache.parquet</groupId>-->
<!-- <artifactId>parquet-avro</artifactId>-->
<!-- </exclusion>-->
<!-- <exclusion>-->
<!-- <groupId>org.apache.parquet</groupId>-->
<!-- <artifactId>parquet-hadoop</artifactId>-->
<!-- </exclusion>-->
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-reload4j -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-reload4j</artifactId>
<version>1.7.36</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>tech.allegro.schema.json2avro</groupId>
<artifactId>converter</artifactId>
<version>0.2.13</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-amazon-web-services</artifactId>
<scope>compile</scope>
<version>2.38.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.13.2</version>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version> <!-- "-jre" for Java 8 or higher -->
</dependency>
<!-- GCP libraries BOM sets the version for google http client -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>${libraries-bom.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
</project>
I want to know how to solve these type of dependency resolution error. I tried to use exclude and include maven tags to try to fix these error.
Tried to find is there any overlapping dependency using mvn dependency:tree
but nothing worked out still getting the same error.
CodePudding user response:
As mentioned by @Gregoire, The issue is due to the jar file which is in the Spark environment, as the classes from Spark jar are always loaded before the class from a shaded jar. You can check the loaded jar from the Spark UI.
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster.
Shading is an extension to the uber JAR idea which is typically limited to use cases where. The JAR is a library to be used inside another application/library.