How can you append to an existing df from inside a for loop? For example:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
colnames = ['Datetime', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Name']
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
What should I do next so that df
is appended to df1
?
CodePudding user response:
You could try pandas.concat()
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
df1 = pd.concat([df1, df], ignore_index=True)
Instead of concat dataframe in each loop, you could also try append dataframe to a list
dfs = []
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
dfs.append(df)
df_ = pd.concat(dfs, ignore_index=True)
print(df_)
High Low Open Close Volume Adj Close Name
0 32.049999 31.549999 31.900000 31.549999 1867000.0 24.759203 amc
1 31.799999 30.879999 31.750000 31.000000 1225900.0 24.327585 amc
2 31.000000 30.350000 30.950001 30.799999 932100.0 24.170631 amc
3 30.900000 30.250000 30.700001 30.350000 1099000.0 23.817492 amc
4 30.700001 30.100000 30.549999 30.650000 782500.0 24.052916 amc
... ... ... ... ... ... ... ...
2515 179.009995 176.339996 176.690002 178.960007 100589400.0 178.960007 aapl
2516 179.610001 176.699997 178.550003 177.770004 92633200.0 177.770004 aapl
2517 178.029999 174.399994 177.839996 174.610001 103049300.0 174.610001 aapl
2518 174.880005 171.940002 174.029999 174.309998 78699800.0 174.309998 aapl
2519 174.880005 171.940002 174.029999 174.309998 78751328.0 174.309998 aapl
[2520 rows x 7 columns]
CodePudding user response:
What you're trying to do won't quite work, since the data retrieved by DataReader has several columns and you need that data for several stocks. However, each of those columns is a time series.
So what you probably want is something that looks like this:
Stock amc
Field High Low Open ...
2022-03-30 29.230000 25.350000 ...
2022-03-31 25.920000 23.260000 ...
2022-04-01 25.280001 22.340000 ...
2022-04-01 25.280001 22.340000 ...
...
And you'd be able to access like df[('amc', 'Low')]
to get a time series for that stock, or like df[('amc', 'Low')]['2022-04-01'][0]
to get the 'Low' value for 'amc' on April 1st.
This gets you exactly that:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
df = pd.DataFrame()
for stock_name in stocks:
stock_df = web.DataReader(stock_name, data_source='yahoo')
for col in stock_df:
df[(stock_name, col)] = stock_df[col]
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['Stock', 'Field'])
print(f'\nall data:\n{"-"*40}\n', df)
print(f'\none series:\n{"-"*40}\n', df[('aapl', 'Volume')])
print(f'\nsingle value:\n{"-"*40}\n', df[('amc', 'Low')]['2022-04-01'][0])
The solution uses a MultiIndex
to achieve what you need. It first loads all the data as retrieved from the API into columns labeled with tuples of stock name and field, and it then converts that into a proper MultiIndex
after loading completes.
Output:
all data:
----------------------------------------
Stock amc ... aapl
Field High Low ... Volume Adj Close
Date ...
2017-04-04 32.049999 31.549999 ... 79565600.0 34.171505
2017-04-05 31.799999 30.879999 ... 110871600.0 33.994480
2017-04-06 31.000000 30.350000 ... 84596000.0 33.909496
2017-04-07 30.900000 30.250000 ... 66688800.0 33.833969
2017-04-10 30.700001 30.100000 ... 75733600.0 33.793839
... ... ... ... ... ...
2022-03-29 34.330002 26.410000 ... 100589400.0 178.960007
2022-03-30 29.230000 25.350000 ... 92633200.0 177.770004
2022-03-31 25.920000 23.260000 ... 103049300.0 174.610001
2022-04-01 25.280001 22.340000 ... 78699800.0 174.309998
2022-04-01 25.280001 22.340000 ... 78751328.0 174.309998
[1260 rows x 12 columns]
one series:
----------------------------------------
Date
2017-04-04 79565600.0
2017-04-05 110871600.0
2017-04-06 84596000.0
2017-04-07 66688800.0
2017-04-10 75733600.0
...
2022-03-29 100589400.0
2022-03-30 92633200.0
2022-03-31 103049300.0
2022-04-01 78699800.0
2022-04-01 78751328.0
Name: (aapl, Volume), Length: 1260, dtype: float64
single value:
----------------------------------------
22.34000015258789