Combining multiple CSVs in pandas-CodePudding

I have multiple csv files (which I've moved into pandas dataframes) in a folder, each of which holds monthly website data

January.csv:
URL    Value
page1   10
page2   52
page3   17

February.csv:
URL     Value
page1   20
page2   7
page3   15

March.csv:
URL     Value
page1   7
page2   15
page3   23

and need to combine them by copying the Value column from each to make a new dataframe (which will ultimately be exported to another csv)

URL     January   February   March
page1    10        20          7
page2    52         7         15
page3    17        15         23

A new csv file will be added to the folder each month, so I need to keep it as dynamic as possible. I'm currently using all_filenames = [i for i in glob.glob('*.{}'.format('csv'))] to get the files with the hope that I can then use something like pd.read_csv(f)['URL'] for f in all_filenames, but that may be totally the wrong approach?

Can anyone point me in the right direction?

Thanks

CodePudding user response：

While @robinood's answer is correct, I think it's more efficient to use generator expression (in round brackets):

all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
result = pd.concat((pd.read_csv(f, index_col='URL', usecols=['URL', 'Value']) for f in all_filenames), axis=1)

Note that the above snippet also sets URL as index to avoid potential concat errors due to different sorting of the rows.

CodePudding user response：

The start of you approach is good, with the glob and read_csv functions.

all_filenames = [i for i in glob.glob('*.{}'.format('csv'))]
all_df = [pd.read_csv(f) for f in all_filenames]

Then when you have the list containing your dfs, you can concat them.

result = pd.concat(all_df, axis=1)

It is more efficient to read all the dataframes first, and then use once the concatenation function than concatenating each df separately.