Need help with iterating through list of data frames and updating the data frame
I have 3 data frames and I want to have only column names containing 'FLAG' and I used the below code
import pandas as pd
df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
for df in [df1, df2, df3]:
# col_lst = [for col in df.columns if col in 'FLAG_']
df = df.filter(regex='FLAG')
print(df3.columns)
Output
but if I assign separately like
df1 = df1.filter(regex='FLAG')
I am getting the expected result. How to iterate through df list to get the desired result
CodePudding user response:
You are currently only creating copies that you discard at each iteration.
You can instead use drop
with inplace=True
:
for df in [df1, df2, df3]:
df.drop(columns=df.columns.difference(df.filter(regex='FLAG').columns), inplace=True)
print(df3.columns)
output:
Index(['FLAG_ACTIVE', 'FLAG_CURRENT'], dtype='object')
CodePudding user response:
We can use enumerate
to get in the index of the list item we currently have in our loop and update it.
dfs = [df1, df2, df3]
for i, df in enumerate(dfs):
dfs[i] = df.filter(like="FLAG")
print(dfs[0])
Empty DataFrame
Columns: [FLAG_ACTIVE, FLAG_CURRENT]
Index: []
A dictionary would be a more clear data structure to use here:
dfs = {"df1": df1, "df2": df2, "df3": df3}
for name, df in dfs.items():
dfs[name] = df.filter(like="FLAG")