I have multiple dataframes with multiple columns as this:
DF =
A B C metadata_Colunm
r1 6 3 9 r1
r2 2 1 1 r2
r3 5 7 2 r3
How can I use a for-loop to iterate over each column to make a new dataframe and then remove rows where values are below 5 for each new dataframe? The result should look like this:
DF_A=
A metadata_Colunm
6 r1
5 r1
DF_B=
B metadata_Colunm
7 r3
DF_C=
C metadata_Colunm
9 r1
What I have done so far is to make a list over the columns I will use (all excluding metadata) and then go trough the columns as new dataframes. Since I also need to preserve the metadata I add the metadata-column as part of the new dataframe:
DF = DF.drop("metadata_Colunm")
ColList = list(DF)
for item in ColList:
locals()[f"DF_{str(item)}"] = DF[[item, "metadata_Colunm"]]
locals()[f"DF_{str(item)}"] = locals()[f"DF_{str(item)}"].drop(locals()[f"DF_{str(item)}"][locals()[f"DF_{str(item)}"].item > 0.5].index, inplace=True)
But using this I get "AttributeError: 'DataFrame' object has no attribute 'item'.
Any suggestions for making this work, or any other solutions, would be greatly appreciated!
Thanks in advance!
CodePudding user response:
you can apply a filter to the dataframe(s) instead of using a loop
def filter(threshold=5, df):
for column in df.columns:
df = df[df[column]>=5]
Then apply the filer to all your dataframes:
dfs = [df1, df2, df3...]
for df in dfs:
filter(df)
CodePudding user response:
dfs = {}
for col in df.columns[:-1]:
df_new = df[[col, 'metadata_Colunm']]
dfs[col] = df_new[df_new[col] >= 5]