Home > database >  How to Load data from CSV into separate Hadoop HDFS directories based on fields
How to Load data from CSV into separate Hadoop HDFS directories based on fields

Time:11-09

I have a CSV of data and I need to load it into HDFS directories based on a certain field (year). I am planning to use Java. I have looked at using BufferedReader however I am having trouble implementing it. Would this be the optimal thing to use for this task or is there a better way?

CodePudding user response:

Use Spark to read the CSV into a dataframe.

use partitionBy("year") during writing to HDFS, and it'll create sub-folders under the path starting with year= for each unique value.

  • Related