In my Flink project I cannot find certain libraries for connectors (specifically I need to ingest a CSV once and read several TBs of parquet data in either batch or streaming mode). I think I have all the required packages, but I am still getting:
[ERROR] import org.apache.flink.connector.file.src.FileSource
[ERROR] ^
[ERROR] C:\Users\alias\project\...\MyFlinkJob.scala:46: error: not found: type FileSource
My POM.xml is rather large, but I think I have the relevant imports:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-parquet</artifactId>
<version>1.15.2</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_${scala.binary.version}</artifactId>
<version>1.11.6</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-bulk_2.12</artifactId>
<version>1.14.6</version>
</dependency>
I am using the following versions:
<scala.version>2.12.16</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<log4j.version>2.17.1</log4j.version>
<flink.version>1.15.1</flink.version>
Do I need a different import path for Scala than Java?
I wish the Flink documentation had the imports in example code snippets as I spend a long time trying to figure out the imports. What are recommended ._
imports?
I've looked through the symbols in the package but didn't find FileSystem. I looked for different tutorials and example projects showing how to read/listen-to parquet and CSV files with Flink. I made some progress this way, but of the few examples I found in Scala (not Java) for using Parquet files as a source the imports still didn't work even after adding their dependencies and running mvn clean install
.
CodePudding user response:
I tried using GitHub's advance search to find a public Scala project using FileSource and eventually found one with the following dependency:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-files</artifactId>
<version>${project.version}</version>
</dependency>
This package was missing on index.scala-lang.org
where I thought I should be looking for dependencies (this is my first Scala project so I thought that was the place to find packages like PyPi in Python). It seems that MVN Repository may be a better place to look.
CodePudding user response:
Flink 1.15 has a Scala-free classpath, which has resulted in a number of Flink artifacts no longer having a Scala suffix. You can read all about it in the dedicated Flink blog on this topic: https://flink.apache.org/2022/02/22/scala-free.html
You can also see in that blog how you can use any Scala version with Flink instead of being limited to Scala 2.12.6. TL;DR: you should use the Java APIs in your application. The Scala APIs will also be deprecated as of Flink 1.17.
Last but not least: don't mix & match Flink version. That won't work.