Home > Mobile >  Amazon EMR with Flink uses an old version of the Percentile class from commons-math3 causing a NoSuc
Amazon EMR with Flink uses an old version of the Percentile class from commons-math3 causing a NoSuc

Time:08-04

I'm trying to run a Flink application in Amazon EMR. I'm using the latest versions, so EMR is at 6.7.0 and Flink is at 1.14.2.

I'm using Maven to build my applications and dependencies into a jar for EMR to run. When I run my Flink application as a step in EMR, I get this error

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.math3.stat.descriptive.rank.Percentile.withNaNStrategy(Lorg/apache/commons/math3/stat/ranking/NaNStrategy;)Lorg/apache/commons/math3/stat/descriptive/rank/Percentile;
    at MyFlinkApp.main(MyFlinkApp.java:85)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

I've added some logging to find out what the version of commons-math3 is, which is where the Percentile class comes from and it's version 3.1.1, which like it says, doesn't include the withNaNStrategy method it's trying to call.

My assumption was that I could run with a newer version of commons-math3, which does have that method added. In my pom.xml I'm specifying a version of commons-math3 to be 3.6.1, which I can verify gets built into my jar.

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-math3</artifactId>
            <version>3.6.1</version>
        </dependency>

I still get the same error, and I can see that it's still using version 3.1.1 of that plugin. Where is this coming from, is this from EMR itself? How could I get around this?

I can provide my full list of dependencies from my pom if that would be helpful.

CodePudding user response:

Unfortunately, you've hit a very common issue running applications on Apache Flink or Apache Spark (on a cluster).

There's two different kinds of classpaths involved:

  • The system classpath on the cluster itself with all Flink dependencies. That classpath contains commons-math3:3.1.1 as transitive dependency of hadoop-common.
  • Your application classpath with commons-math3:3.6.1

If you run your Flink applications on a cluster, your application classpath is just appended to the existing system classpath and what's first wins. There's no more dependency resolution that would figure out which version should be effectively used. Also, typically, the application classpath isn't yet available when executors are started ...

If running in local mode everything will typically just work - in most cases at least - as all dependencies are resolved to one classpath.

The solution to this is to shade (aka rename the package of) conflicting libraries using the Maven Shade Plugin. With this plugin you can build an uber jar that contains the conflicting classes at a different location so that they can be used independently to what's already present on the cluster.

<plugin> 
  <groupId>org.apache.maven.plugins</groupId> 
  <artifactId>maven-shade-plugin</artifactId>  
  <version>3.3.0</version> 
  <executions> 
    <execution> 
      <phase>package</phase> 
      <goals> 
        <goal>shade</goal> 
      </goals> 
      <configuration> 
      <shadedArtifactAttached>true</shadedArtifactAttached> 
      <shadedClassifierName>allinone</shadedClassifierName> 
      <artifactSet> 
        <includes> 
          <include>*:*</include> 
        </includes> 
      </artifactSet> 
      <relocations> 
        <relocation> 
          <pattern>org.apache.commons.math3</pattern> 
          <shadedPattern>org.apache.commons.math3_1_1</shadedPattern> 
        </relocation> 
      </relocations> 
      </configuration>  
    </execution>  
  </executions> 
</plugin> 
  • Related