Home > other >  Python/Pandas - Check if multiple columns has any of three items in a list
Python/Pandas - Check if multiple columns has any of three items in a list

Time:12-07

I am creating a binary target variable that is 1 if any of these select columns (Current, Month1, Month2, Month3, Month4, Month5, Month6) has any number of these three strings ('Item1', 'Item2', 'Item3'). If none of these columns have these items then the target variable for this row will be populated with 0. For example:

Current Month1 Month2 Month3 Month4 Month5 Month6 Target
Item8 Item7 Item7 Item8 Item8 Item8 Item8 0
Item3 Item4 Item4 Item4 Item4 Item4 Item4 1
Item3 Item4 Item1 Item2 Item3 Item3 Item3 1

For the first row, none of the columns Current-Month6 contain either Item1, Item2, or Item3 so the Target is 0. For row 2, At least one of the key items (Item3) is in at least one of the select columns (Current) so the Target is 1. And finally, Item1, Item2, and Item3, are throughout the 6 columns so the target is 1.

I currently have this loop set up that looks at all the select columns in each row to see if they contain any of the three items. If so, the list is appends a 1 at the index, otherwise 0:

Target = []
for i in range(df.shape[0]):
    if (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item1')).any()) == True) or  (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item2')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item3')).any()) == True):
        Target.append(1)
    else:
        Target.append(0)

I was wondering if there was a faster/more efficient way to do this, or a way to include multiple items in the .eq() function?

Thanks!

CodePudding user response:

df = pd.read_clipboard()

# get str cols
str_cols = df.select_dtypes(include=["object", "string"])

# find any number of references
row_ = str_cols.apply(lambda row: row.str.contains(r'Item1|Item2|Item3', regex=True, case=False, na=False)).any(axis=1)

# assign to target
df['target'] = np.where(row_.where(lambda x: x).notna(), 1, 0)
  • Related