I have adjusted my csv files of different stocks to have the date column as index and the adj close price as another column. Then I renamed my adj close price as the filename, so different type of stock labels.
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import os
import pandas as pd
os.chdir("/content/drive/My Drive")
path = "/content/drive/MyDrive/Stock"
files=os.listdir(path)
df=pd.DataFrame()
for file in files:
if file.endswith(".csv"):
columns = ['Adj Close', 'Date']
df_temp = pd.read_csv(file, usecols=columns, index_col='Date')
df_temp = df_temp.rename(columns={'Adj Close': file})
df = pd.concat([df_temp], ignore_index=False, axis=1)
display(df)
Then I tried to concat the stock files However, the result was still multiple csv files. How can I concat the file so that the final file looks like this?
| Stock A | Stock B |
Date | -------- | -------- |
date1 | Adj Close 1A | Adj Close 1B |
date2 | Adj Close 2A | Adj Close 2B |
CodePudding user response:
From your code what I see is in the following line your concat syntax is wrong.
Instead of
df = pd.concat([df_temp], ignore_index=False, axis=1)
You should also pass df in while concatenating:
df = pd.concat([df,df_temp], ignore_index=False, axis=1)
And for more information please share some sample data of the input and the output data so that we can understand them properly.
Thanks!
CodePudding user response:
I would pd.concat
outside of the for loop, which would, instead, be used to create a Python list to pass into pd.concat
.
Also, initialising an empty DataFrame using df = pd.DataFrame()
is not required.
Full Code
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import os
import pandas as pd
os.chdir("/content/drive/My Drive")
path = "/content/drive/MyDrive/Stock"
files = os.listdir(path)
df_list = []
for file in files:
if file.endswith(".csv"):
columns = ['Adj Close', 'Date']
df_temp = pd.read_csv(file, usecols=columns, index_col='Date')
df_temp = df_temp.rename(columns={'Adj Close': file})
df_list.append(df_temp)
df = pd.concat(df_list, ignore_index=False, axis=1)
display(df)