How to concatente a list of csv (including empty ones) using Pandas-CodePudding

I have a list of .csv files stored in a local folder and I'm trying to concatenate them into one single dataframe.

Here is the code I'm using :

import pandas as pd
import os

folder = r'C:\Users\_M92\Desktop\myFolder'

df = pd.concat([pd.read_csv(os.path.join(folder, f), delimiter=';') for f in os.listdir(folder)])
display(df)

Only one problem, it happens that one of the files is sometimes empty (0 cols, 0 rows) and in this case, pandas is throwing an EmptyDataError: No columns to parse from file in line 6.

Do you have any suggestions how to bypass the empty csv file ?
And why not how to concatenate csv files in a more efficient/simplest way.

Ideally, I would also like to add a column (to the dataframe df) to carry the name of each .csv.

CodePudding user response：

You can check if a file is empty with:

import os

os.stat(FILE_PATH).st_size == 0

In your use case:

import os

df = pd.concat([
    pd.read_csv(os.path.join(folder, f), delimiter=';') \
    for f in os.listdir(folder) \
    if os.stat(os.path.join(folder, f)).st_size != 0
])

CodePudding user response：

Personally I would filter the files for content first, then merge them using the basic try-except.

import pandas as pd
import os

folder = r'C:\Users\_M92\Desktop\myFolder'
data = []

for f in os.listdir(folder):
   try:
      temp = pd.read_csv(os.path.join(folder, f), delimiter=';')
      # adding original filename column as per request
      temp['origin'] = f
      data.append(temp)
   except pd.errors.EmptyDataError:
      continue

df = pd.concat(data)

display(df)