Home > Software engineering >  Combining multiple CSVs in pandas
Combining multiple CSVs in pandas

Time:11-03

I have multiple csv files (which I've moved into pandas dataframes) in a folder, each of which holds monthly website data

January.csv:
URL    Value
page1   10
page2   52
page3   17

February.csv:
URL     Value
page1   20
page2   7
page3   15

March.csv:
URL     Value
page1   7
page2   15
page3   23

and need to combine them by copying the Value column from each to make a new dataframe (which will ultimately be exported to another csv)

URL     January   February   March
page1    10        20          7
page2    52         7         15
page3    17        15         23

A new csv file will be added to the folder each month, so I need to keep it as dynamic as possible. I'm currently using all_filenames = [i for i in glob.glob('*.{}'.format('csv'))] to get the files with the hope that I can then use something like pd.read_csv(f)['URL'] for f in all_filenames, but that may be totally the wrong approach?

Can anyone point me in the right direction?

Thanks

CodePudding user response:

While @robinood's answer is correct, I think it's more efficient to use generator expression (in round brackets):

all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
result = pd.concat((pd.read_csv(f, index_col='URL', usecols=['URL', 'Value']) for f in all_filenames), axis=1)

Note that the above snippet also sets URL as index to avoid potential concat errors due to different sorting of the rows.

CodePudding user response:

The start of you approach is good, with the glob and read_csv functions.

all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
all_df = [pd.read_csv(f) for f in all_filenames]

Then when you have the list containing your dfs, you can concat them.

result = pd.concat(all_df, axis=1)

It is more efficient to read all the dataframes first, and then use once the concatenation function than concatenating each df separately.

  • Related