I have a big dataframe, a sample of this df is like as follows:
etf_list = pd.DataFrame({'ISIN':['LU1737652583', 'IE00B44T3H88', 'IE0005042456', 'IE00B1FZS574', 'IE00BYMS5W68'],
'ETF_Vendor':['Amundi', 'HSBC', 'iShares', 'iShares', 'Invesco']})
In my local folder 'ETF/Input/', among many other files, the files IE00B1FZS574.csv and IE0005042456.csv are stored.
I would like to create a dataframe by reading the csv files, but only each iteration if the ETF_Vendor in etf_list equals 'iShares'. So I wrote the following for loop:
iShares = []
for i, row in etf_list.iterrows():
if row['ETF_Vendor'] == 'iShares':
ISIN = row['ISIN']
iShares.append(ISIN) # At each iteration, the list is filled with the ISINs for the relevant dataframes
# Assign downloaded file the name of the relevant ISIN
df[row['ISIN']] = 'ETF/Input/' row['ISIN'] '.csv'
# Define file as DataFrame, again specifying the ISIN as the name for the DataFrame.
df[row['ISIN']] = pd.read_csv(df[row['ISIN']], sep=',', skiprows=2, thousands='.', decimal=',')
else:
pass
The problem with this loop is that the dataframes named like df['IE00B1FZS574']. But I want the dataframes to be named like the ISIN, so like e.g. IE00B1FZS574
How do I have to change my code in order to name the dataframes as e.g. IE00B1FZS574 instead of df['IE00B1FZS574']?
TY in advance.
CodePudding user response:
There are a couple of ways to go about it
Let's say you read the data as in your question. Here I'm storing each dataframe in a dict called dataframes
. Orderly and Pythonic, so far so good
import pandas as pd
dataframes = {}
for i, row in something_you_have: # Your details
name = row['ISIN']
dataframes[name] = pd.read_csv(....)
Now we can access the dataframes using dataframes['IE00B1FZS574']
and so on.
How to make this a bit more fluent?
A. Keep the dataframes in the dict. This is also an alternative.
B. We can use a namespace
import types
datans = types.SimpleNamespace(**dataframes)
datans.IE00B1FZS574
With the namespace we can access items from the previous dicts as just attributes on the namespace. Of course the keys in the dict need to be valid python identifiers. So datans.IE00B1FZS574
works here.
C. We can add items from the dataframes
dict directly into the current module-global namespace.
When is this appropriate? In a notebook maybe. Some would say this is bad style.
# update the "globals" (current module namespace) with the dict
globals().update(dataframes)
IE00B1FZS574
Now we can access the dataframes using just IE00B1FZS574
etc in the current module.
In my analyses I usually go with option A but could consider option B to be good too. Normally avoid C. The reason is that the analysis should be maintainable and somewhat agile - data is data - the analysis should be data-driven and easy to update when the dataset has slight changes.