I read multiple questions similar to this one but not specifically addressing this use case.
I have multiple ticker.csv files in a folder such as:
ZZZ.TO.csv containing:
Date Open High Low Close Volume
0 2017-03-14 28.347332 28.347332 27.871055 28.267952 22400
1 2017-03-15 28.320875 28.400254 27.959257 28.188574 39200
2 2017-03-16 28.179758 28.797155 28.126837 28.708954 51600
3 2017-03-17 28.576658 28.691315 28.091559 28.550196 57400
I would like to create a dataframe containing all 'Date' and 'Close' data from each file. Set 'Date' as the index and have each ticker as the column header in the final dataframe.
So the final dataframe would look like this:
Date FOO.TO ZOMD.V ZEN.V TICKER.BAR
2017-03-14 28.347332 28.347332 27.871055 28.267952
2017-03-15 28.320875 28.400254 27.959257 28.188574
2017-03-16 28.179758 28.797155 28.126837 28.708954
2017-03-17 28.576658 28.691315 28.091559 28.550196
This is what I tried:
import pandas as pd
import glob
path = r'/path_where_files_are/'
all_files = glob.glob(path "/*.csv")
all_files.sort()
fields = ['Date','Close']
list = []
for filename in all_files:
df = pd.read_csv(filename, header=0, usecols=fields)
df.set_index(['Date'], inplace=True)
list.append(df)
frame = pd.concat(list, axis=0)
but it produces:
Date Close
2017-03-14 0.050000
2017-09-21 0.040000
2017-09-22 0.040000
2017-10-13 0.100000
2017-10-16 0.110000
Any help is welcome. Cheers.
CodePudding user response:
You can try:
import pandas as pd
import pathlib
path = pathlib.Path(r'./data2')
data = {}
for filename in sorted(path.glob('*.csv')):
data[filename.stem] = pd.read_csv(filename, index_col='Date',
usecols=['Date', 'Close'],
parse_dates=['Date']).squeeze()
df = pd.concat(data, axis=1)
Output:
>>> df
ZEN.V ZZZ.TO
Date
2017-03-14 28.267952 28.267952
2017-03-15 28.188574 28.188574
2017-03-16 28.708954 28.708954
2017-03-17 28.550196 28.550196
CodePudding user response:
Few things that can help you:
- You want to concatenate horizontally, so use
pd.concat(..., axis=1)
orpd.concat(..., axis='columns')
; - Don't forget to rename the Close column in your dataframe after you read it;
- It is good practice not to overwrite names of Python built-ins – so instead of
list
, use something descriptive e.g.dfs_to_merge
.