I have a folder which has
Sales_December.csv
Sales_January.csv
Sales_February.csv
etc.
How can i make pyspark read all of them into 1 dataframe?
CodePudding user response:
- create an empty list
- read your csv files one by one and append DataFrames to the list
- use
reduce(DataFrame.unionAll, <list>)
to combine them into one single DataFrame