make an 'AND' condition with all elements in the list-CodePudding

I have a pandas dataframe and I want to filter/select conditions based on elements of an input list. So, for example, I have something like:

filters = ['category', 'name']
# I am just trying to select the columns which would match as follows:

data = {'category_name_courses': ["Spark","PySpark","Python","pandas"], 'category_name_area': ["cloud", "cloud", "prog", "ds"], 'some_other_column': [0, 0, 0, 0]

x = pd.DataFrame(data)

selections = list()

for col in x.columns:
    if ('name' in col) and ('category' in col):
        selections.append(col)

In my case, this if condition or some other way of selection should be built by 'ANDing' everything from this input list

CodePudding user response：

IIUC, do you want?

reg_str = '&'.join(filters)
x.filter(regex=reg_str)

Output:

  category_name_courses category_name_area
0                 Spark              cloud
1               PySpark              cloud
2                Python               prog
3                pandas                 ds

CodePudding user response：

Your edit shows that you want to filter columns based on their name.

Simply use:

filters = ['category', 'name']
for col in x.columns:
    if all(x in col for x in filters):
        print(col)

Output:

category_name_courses
category_name_area

older answer: filtering values

You can do almost what you suggested:

x = pd.DataFrame([['flow', 'x', 'category'],['x','x','flow']])

for col in x.columns:
    if ('flow' in x[col].values) and ('category' in x[col].values):
        # Do something with this column...
        print(f'column "{col}" matches')

Using a list of matches:

filters = ['category', 'flow']

for col in x.columns:
    if all(x in x[col].values for x in filters):
        # Do something with this column...
        print(f'column "{col}" matches')

Or, more efficiently, using a set:

filters = set(['category', 'flow'])
for col in x.columns:
    if set(x[col]) >= filters:
        # Do something with this column...
        print(f'column "{col}" matches')

Example:

column "2" matches

CodePudding user response：

Try it (if I understand your problem correctly):

You have a list of condition, and you want to sure, that the column includes all of it.

filters = ['category', 'name']
# x is some pandas dataframe
# I am trying to create something which would be equivalent of:

for col in x.columns:
    boo = True
    for condition in filters:
        if not condition in col:
            boo = False
            break
    if boo == True:
        #to do something

CodePudding user response：

I really did not understand the question. I assume you want to find all column names witch contains any value in the given list.

The pandas data frame has loads of methods and functionalities. You may be familiar with dt which allows one to apply datetime functionalities to a given column.

In the same way one can use str for string functionalities.

So you can see if any string is in any value (row) of a column:

a_df.str.contains("string")

it will return a list of bool where condition is met.

One can use | (or) to check multiple strings.

a_df.str.contains("string|someting")

So the line should look like:

x.columns.str.contains("|".join(filters))

Now you have the mask you can apply it:

x.columns[x.columns.str.contains("|".join(filters))]

Since you have the column names you can access the column data itself:

x[x.columns[x.columns.str.contains("|".join(filters))]]

output:

  category_name_courses category_name_area
0                 Spark              cloud
1               PySpark              cloud
2                Python               prog
3                pandas                 ds