Home > other >  Flink Scala Missing Import
Flink Scala Missing Import

Time:11-15

In my Flink project I cannot find certain libraries for connectors (specifically I need to ingest a CSV once and read several TBs of parquet data in either batch or streaming mode). I think I have all the required packages, but I am still getting:

[ERROR] import org.apache.flink.connector.file.src.FileSource
[ERROR]                                   ^
[ERROR] C:\Users\alias\project\...\MyFlinkJob.scala:46: error: not found: type FileSource

My POM.xml is rather large, but I think I have the relevant imports:

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-parquet</artifactId>
            <version>1.15.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_${scala.binary.version}</artifactId>
            <version>1.11.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-hadoop-bulk_2.12</artifactId>
            <version>1.14.6</version>
        </dependency>

I am using the following versions:

<scala.version>2.12.16</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<log4j.version>2.17.1</log4j.version>
<flink.version>1.15.1</flink.version>

Do I need a different import path for Scala than Java?

I wish the Flink documentation had the imports in example code snippets as I spend a long time trying to figure out the imports. What are recommended ._ imports?

I've looked through the symbols in the package but didn't find FileSystem. I looked for different tutorials and example projects showing how to read/listen-to parquet and CSV files with Flink. I made some progress this way, but of the few examples I found in Scala (not Java) for using Parquet files as a source the imports still didn't work even after adding their dependencies and running mvn clean install.

CodePudding user response:

I tried using GitHub's advance search to find a public Scala project using FileSource and eventually found one with the following dependency:

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-files</artifactId>
            <version>${project.version}</version>
        </dependency>

This package was missing on index.scala-lang.org where I thought I should be looking for dependencies (this is my first Scala project so I thought that was the place to find packages like PyPi in Python). It seems that MVN Repository may be a better place to look.

CodePudding user response:

Flink 1.15 has a Scala-free classpath, which has resulted in a number of Flink artifacts no longer having a Scala suffix. You can read all about it in the dedicated Flink blog on this topic: https://flink.apache.org/2022/02/22/scala-free.html

You can also see in that blog how you can use any Scala version with Flink instead of being limited to Scala 2.12.6. TL;DR: you should use the Java APIs in your application. The Scala APIs will also be deprecated as of Flink 1.17.

Last but not least: don't mix & match Flink version. That won't work.

  • Related