Home > Blockchain >  pandas: Use truncated filename as header for column in new dataframe from multiple csv files, read s
pandas: Use truncated filename as header for column in new dataframe from multiple csv files, read s

Time:03-15

I read multiple questions similar to this one but not specifically addressing this use case.

I have multiple ticker.csv files in a folder such as:

ZZZ.TO.csv containing:

    Date        Open        High        Low         Close       Volume
0   2017-03-14  28.347332   28.347332   27.871055   28.267952   22400
1   2017-03-15  28.320875   28.400254   27.959257   28.188574   39200
2   2017-03-16  28.179758   28.797155   28.126837   28.708954   51600
3   2017-03-17  28.576658   28.691315   28.091559   28.550196   57400

I would like to create a dataframe containing all 'Date' and 'Close' data from each file. Set 'Date' as the index and have each ticker as the column header in the final dataframe.

So the final dataframe would look like this:

Date        FOO.TO      ZOMD.V      ZEN.V       TICKER.BAR
2017-03-14  28.347332   28.347332   27.871055   28.267952
2017-03-15  28.320875   28.400254   27.959257   28.188574
2017-03-16  28.179758   28.797155   28.126837   28.708954
2017-03-17  28.576658   28.691315   28.091559   28.550196

This is what I tried:

import pandas as pd
import glob

path = r'/path_where_files_are/'
all_files = glob.glob(path   "/*.csv")
all_files.sort()

fields = ['Date','Close']
list = []

for filename in all_files:
df = pd.read_csv(filename, header=0, usecols=fields)
df.set_index(['Date'], inplace=True)
list.append(df)

frame = pd.concat(list, axis=0)

but it produces:

Date        Close   
2017-03-14  0.050000
2017-09-21  0.040000
2017-09-22  0.040000
2017-10-13  0.100000
2017-10-16  0.110000

Any help is welcome. Cheers.

CodePudding user response:

You can try:

import pandas as pd
import pathlib

path = pathlib.Path(r'./data2')

data = {}
for filename in sorted(path.glob('*.csv')):
    data[filename.stem] = pd.read_csv(filename, index_col='Date', 
                                      usecols=['Date', 'Close'],
                                      parse_dates=['Date']).squeeze()
df = pd.concat(data, axis=1)

Output:

>>> df
                ZEN.V     ZZZ.TO
Date                            
2017-03-14  28.267952  28.267952
2017-03-15  28.188574  28.188574
2017-03-16  28.708954  28.708954
2017-03-17  28.550196  28.550196

CodePudding user response:

Few things that can help you:

  • You want to concatenate horizontally, so use pd.concat(..., axis=1) or pd.concat(..., axis='columns');
  • Don't forget to rename the Close column in your dataframe after you read it;
  • It is good practice not to overwrite names of Python built-ins – so instead of list, use something descriptive e.g. dfs_to_merge.
  • Related