Home > Back-end >  Pandas DataFrame if it doesn't contain certain substrings
Pandas DataFrame if it doesn't contain certain substrings

Time:12-16

I'm creating a program to get an overview of my expenses by category. However, I cannot possibly write down every place where I use my card, so at the end I want to categorize all transactions that do not have my set category names in the 'Category' column already to be categorized as "Other".

Below is how I try to do it, searched for solutions and some people just told to whip a ~ in front of the action to make it do the opposite. Like regular negation. Ain't working. What's the optimal solution here?

My idea here is that wherever the Category isn't Entertainment, Microinvestments etc, the Category column cell in that row will be set to "Other".

df['Category'] = np.where(~df['Category'].str.contains('Entertainment|Microinvestments|Food|Transport|Transfers|Cash|Bills|Apparel|Consumer goods|Services', case=False), 'Other', df['Category'])

CodePudding user response:

Try this:

df.loc[~df['Category'].str.contains('Entertainment|Microinvestments|Food|Transport|Transfers|Cash|Bills|Apparel|Consumer goods|Services', case=False), 'Category'] = 'Other'
  • Related