Home > Software design >  How to select dataframe in dictionary of dataframes that contains a column with specific substring
How to select dataframe in dictionary of dataframes that contains a column with specific substring

Time:01-18

I have a dictionary of dataframes df_dict. I then have a substring "blue". I want to identify the name of the dataframe in my dictionary of dataframes that has at least one column that has a name containing the substring "blue".

I am thinking of trying something like:

for df in df_dict:
     if df.columns.contains('blue'):
          return df
     else:
          pass

However, I am not sure if a for loop is necessary here. How can I find the name of the dataframe I am looking for in my dictionary of dataframes?

CodePudding user response:

I think loops are necessary for iterate items of dictionary:

df1 = pd.DataFrame({"aa_blue": [1,2,3],
                   'col':list('abc')})
df2 = pd.DataFrame({"f": [1,2,3],
                   'col':list('abc')})
df3 = pd.DataFrame({"g": [1,2,3],
                   'bluecol':list('abc')})

df_dict = {'df1_name' : df1, 'df2_name' : df2, 'df3_name' : df3}

out = [name for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
print (out)
['df1_name', 'df3_name']

Or:

out = [name for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
print (out)
['df1_name', 'df3_name']

For list of DataFrames use:

out = [df for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
out = [df for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
print (out)
[   aa_blue col
0        1   a
1        2   b
2        3   c,    g bluecol
0  1       a
1  2       b
2  3       c]
  • Related