I am trying to concatenate multiple csv files into one file(about 30 files). All csv files are located in different folders.
However, I have encountered an error while appending all files together: OSError: Initializing from file failed
Here is my code:
import pandas
import glob
path = 'xxx'
target_folders=['Apples', 'Oranges', 'Bananas','Raspberry','Strawberry', 'Blackberry','Gooseberry','Liche']
output ='yyy'
path_list = []
for idx in target_folders:
lst_of_files = glob.glob(path idx '\\*.csv')
latest_files = max(lst_of_files, key=os.path.getmtime)
path_list.append(latest_files)
df_list = []
for file in path_list:
df = pd.read_csv(file)
df_list.append(df)
final_df = df.append(df for df in df_list)
combined_csv = pd.concat([pd.read_csv(f) for f in latest_files])
combined_csv.to_csv(output "combined_csv.csv", index=False)
OSError Traceback (most recent call last)
<ipython-input-126-677d09511b64> in <module>
1 df_list = []
2 for file in latest_files:
----> 3 df = pd.read_csv(file)
4 df_list.append(df)
5 final_df = df.append(df for df in df_list)
OSError: Initializing from file failed
CodePudding user response:
Try to simplify your code:
import pandas as pd
import pathlib
data_dir = 'xxx'
out_dir = 'yyy'
data = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
df = pd.read_csv(filename)
data.append(df)
df = pd.concat(df, ignore_index=True)
df.to_csv(pathlib.Path('out_dir') / 'combined_csv.csv', index=False)
CodePudding user response:
Without seeing your CSV file it's hard to be sure, but I've come across this problem before with unusually formatted CSVs. The CSV parser may be having difficulty in determine the structure of the CSV files, separators etc.
Try df = pd.read_csv(file, engine = 'python')
From the docs: "The C engine is faster while the python engine is currently more feature-complete."
Try passing the engine = 'python'
argument on reading a single CSV file and see if you get a successful read. That way you can narrow down the problem to either file reads or traversing the files.