Home > Software design >  How to append to another df from inside a for loop
How to append to another df from inside a for loop

Time:04-04

How can you append to an existing df from inside a for loop? For example:

import pandas as pd
from pandas_datareader import data as web

stocks = ['amc', 'aapl']
colnames = ['Datetime', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Name']
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
    df = web.DataReader(stock,  'yahoo')
    df['Name'] = stock

What should I do next so that df is appended to df1?

CodePudding user response:

You could try pandas.concat()

df1 = pd.DataFrame(data=None, columns=colnames)

for stock in stocks:
    df = web.DataReader(stock,  'yahoo')
    df['Name'] = stock
    df1 = pd.concat([df1, df], ignore_index=True)

Instead of concat dataframe in each loop, you could also try append dataframe to a list

dfs = []
for stock in stocks:
    df = web.DataReader(stock,  'yahoo')
    df['Name'] = stock
    dfs.append(df)
df_ = pd.concat(dfs, ignore_index=True)
print(df_)

            High         Low        Open       Close       Volume   Adj Close  Name
0      32.049999   31.549999   31.900000   31.549999    1867000.0   24.759203   amc
1      31.799999   30.879999   31.750000   31.000000    1225900.0   24.327585   amc
2      31.000000   30.350000   30.950001   30.799999     932100.0   24.170631   amc
3      30.900000   30.250000   30.700001   30.350000    1099000.0   23.817492   amc
4      30.700001   30.100000   30.549999   30.650000     782500.0   24.052916   amc
...          ...         ...         ...         ...          ...         ...   ...
2515  179.009995  176.339996  176.690002  178.960007  100589400.0  178.960007  aapl
2516  179.610001  176.699997  178.550003  177.770004   92633200.0  177.770004  aapl
2517  178.029999  174.399994  177.839996  174.610001  103049300.0  174.610001  aapl
2518  174.880005  171.940002  174.029999  174.309998   78699800.0  174.309998  aapl
2519  174.880005  171.940002  174.029999  174.309998   78751328.0  174.309998  aapl

[2520 rows x 7 columns]

CodePudding user response:

What you're trying to do won't quite work, since the data retrieved by DataReader has several columns and you need that data for several stocks. However, each of those columns is a time series.

So what you probably want is something that looks like this:

Stock       amc
Field       High       Low        Open  ...
2022-03-30  29.230000  25.350000  ...
2022-03-31  25.920000  23.260000  ...
2022-04-01  25.280001  22.340000  ...
2022-04-01  25.280001  22.340000  ...
...

And you'd be able to access like df[('amc', 'Low')] to get a time series for that stock, or like df[('amc', 'Low')]['2022-04-01'][0] to get the 'Low' value for 'amc' on April 1st.

This gets you exactly that:

import pandas as pd
from pandas_datareader import data as web

stocks = ['amc', 'aapl']
df = pd.DataFrame()
for stock_name in stocks:
    stock_df = web.DataReader(stock_name, data_source='yahoo')
    for col in stock_df:
        df[(stock_name, col)] = stock_df[col]
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['Stock', 'Field'])

print(f'\nall data:\n{"-"*40}\n', df)

print(f'\none series:\n{"-"*40}\n', df[('aapl', 'Volume')])

print(f'\nsingle value:\n{"-"*40}\n', df[('amc', 'Low')]['2022-04-01'][0])

The solution uses a MultiIndex to achieve what you need. It first loads all the data as retrieved from the API into columns labeled with tuples of stock name and field, and it then converts that into a proper MultiIndex after loading completes.

Output:

all data:
----------------------------------------
 Stock             amc             ...         aapl            
Field            High        Low  ...       Volume   Adj Close
Date                              ...                         
2017-04-04  32.049999  31.549999  ...   79565600.0   34.171505
2017-04-05  31.799999  30.879999  ...  110871600.0   33.994480
2017-04-06  31.000000  30.350000  ...   84596000.0   33.909496
2017-04-07  30.900000  30.250000  ...   66688800.0   33.833969
2017-04-10  30.700001  30.100000  ...   75733600.0   33.793839
...               ...        ...  ...          ...         ...
2022-03-29  34.330002  26.410000  ...  100589400.0  178.960007
2022-03-30  29.230000  25.350000  ...   92633200.0  177.770004
2022-03-31  25.920000  23.260000  ...  103049300.0  174.610001
2022-04-01  25.280001  22.340000  ...   78699800.0  174.309998
2022-04-01  25.280001  22.340000  ...   78751328.0  174.309998

[1260 rows x 12 columns]

one series:
----------------------------------------
 Date
2017-04-04     79565600.0
2017-04-05    110871600.0
2017-04-06     84596000.0
2017-04-07     66688800.0
2017-04-10     75733600.0
                 ...     
2022-03-29    100589400.0
2022-03-30     92633200.0
2022-03-31    103049300.0
2022-04-01     78699800.0
2022-04-01     78751328.0
Name: (aapl, Volume), Length: 1260, dtype: float64

single value:
----------------------------------------
 22.34000015258789
  • Related