**I need to extract "adj close" from all into a new DF and rename based on source and map based on Date
new_DF = date AAL AAPL ALK .....(containing adj close) please help**
AAL = pd.read_csv("AAL.csv")
AAPL = pd.read_csv("AAPL.csv")
ALK = pd.read_csv("ALK.csv")
AMZN = pd.read_csv("AMZN.csv")
BHC = pd.read_csv("BHC.csv")
CS = pd.read_csv("CS.csv")
DB = pd.read_csv("DB.csv")
GS = pd.read_csv("GS.csv")
GOOG = pd.read_csv("GOOG.csv")
HA = pd.read_csv("HA.csv")
JNJ = pd.read_csv("JNJ.csv")
MRK = pd.read_csv("MRK.csv")
SP500 = pd.read_csv("S&P500.csv")
df = date | Open| high | low |close | adj close |volume
CodePudding user response:
Try this:
# load csv data
# define relative path to folder containing csv data
files_folder = '/path/to/csv/'
# load all csv files in one dataframe
df_list = []
for file in glob.glob(os.path.join(files_folder, '*.csv')):
df = pd.read_csv(file)
# write here column you want to select
df_column = df['column_name'].rename(columns={'column_name':file[:-4]})
df_list.append(df_column)
# concatenate the list of dataframes into one
df_final = pd.concat(df_list, axis=1)
CodePudding user response:
Here is an example. Not having your .csv files means we need to be creative in terms of how to get the data, but imagine you have a dict
of DataFrames
, one per ticker.
Here we use Yahoo finance to get similar data. The column we are looking for ('adj close'
) is not in that data, so for this example we'll use Close
instead.
import yfinance as yf
tickers = ['AAL', 'AAPL', 'AMZN', 'GOOG']
sources = {ticker: yf.Ticker(ticker).history(period='5d') for ticker in tickers}
At this point, we've got data for each ticker. For example:
>>> sources['AAPL']
Open High Low Close Volume Dividends Stock Splits
Date
2022-03-30 178.550003 179.610001 176.699997 177.770004 92633200 0 0
2022-03-31 177.839996 178.029999 174.399994 174.610001 103049300 0 0
2022-04-01 174.029999 174.880005 171.940002 174.309998 78699800 0 0
2022-04-04 174.570007 178.490005 174.440002 178.440002 76468400 0 0
2022-04-05 177.500000 178.300003 174.419998 175.059998 73311300 0 0
In your case, you'd get the data from CSV files, so instead:
sources = {k: pd.read_csv(f'{k}.csv') for k in tickers}
Now, the answer to your question:
df = pd.concat([v['Close'].to_frame(k) for k, v in sources.items()], axis=1)
>>> df
AAL AAPL AMZN GOOG
Date
2022-03-30 18.049999 177.770004 3326.020020 2852.889893
2022-03-31 18.250000 174.610001 3259.949951 2792.989990
2022-04-01 18.240000 174.309998 3271.199951 2814.000000
2022-04-04 18.230000 178.440002 3366.929932 2872.850098
2022-04-05 17.840000 175.059998 3281.100098 2821.260010
Again, in your case you would select the 'adj close'
column instead.