Home > Software engineering >  Reading multiple csv files into separate dataframes in Python
Reading multiple csv files into separate dataframes in Python

Time:01-08

I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.

I tried the following, as given in the link:

# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
 
# append datasets to the list
for i in range(file_count):
    temp_df = pd.read_csv("./csv/" files[i])
    dataframes_list.append(temp_df)

However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!

CodePudding user response:

You can use pathlib to do that easily:

import pandas as pd
import pathlib

DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'

dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
    temp_df = pd.read_csv(csvfile)
    dataframes_list.append(temp_df)

CodePudding user response:

I guess you should specify the whole path in read_csv method by adding the path variable to the concatenated string. Something like :

for i in range(file_count):
    temp_df = pd.read_csv(path   "/csv/"   files[i])
    dataframes_list.append(temp_df)

You can remove the "/csv/" by doing path files[i] directly if your CSV files are in the Healthy directory

CodePudding user response:

In your example, path is the root of each file in files, so you can do

temp_df = pd.read_csv(os.path.join(path, files[i]))

But we really wouldn't do it this way. Suppose there aren't any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy")) would raise a StopIteration error that you don't handle. I think it would be more natural to use os.listdir, glob.glob or even pathlib.Path. Since pathlib keeps track of the root for you, a good choice is

from pathlib import Path 
import pandas as pd

healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
    if file.is_file()]
  • Related