Home > database >  How to filter for substring in list comprehension in python
How to filter for substring in list comprehension in python

Time:01-18

I have a dictionary of dataframes. I want to create a list of the names of all of the dataframes in this dictionary that have the substring "blue" in them. No dataframes in the dictionary of dataframes contain a column simply called "blue". It is some variation of "blue", including: "blue_max", "blue_min", blue_average", etc. The point is that "blue" is a substring in the column names of all the dataframes in my dictionary of dataframes.

And so, to create a list of all the dataframes in my dictionary of dataframes that contain a column that exactly is "blue_max", I run the following using a list comprehension:

df_list = [x for x in df_dictionary if "blue_max" in df_dictionary[x].columns]

print(df_list)

And this prints a list of all dataframe names in my dictionary of dataframes that have a column called exactly "blue_max".

However, this is not what I want. I want a list of all the dataframes names in my dictionary of dataframes that contain at least one column whose name contains the substring "blue". And so, if one of these dataframes has a column called "blue_max", or "blue_min", or "blue_average" in it, then I want the name of that dataframe added to my list "df_list".

However, when I try to find those dataframes that have a column containing the substring "blue" running:

df_list = [x for x in df_dictionary if "blue" in df_dictionary[x].columns]

print(df_list)

I just get an empty list: [].

This is not what I want. I know I have dataframes in my dictionary of dataframes that have columns with names (headers) containing the substring "blue", because I know there are the columns "blue_max", "blue_min", and "blue_average". How can I fix my code so that is looks for the substring "blue" in the column names of my dataframes, rather than column names with the exactly name "blue"?

CodePudding user response:

df_list = [x for x in df_dictionary if df_dictionary[x].columns.str.contains('blue').any()]
  • Related