Home > Mobile >  pandas contains regex
pandas contains regex

Time:06-01

I would like to match all cells that beginns with 978 number. But following code matches 397854 or nan too.

an_transaction_product["kniha"] = np.where(an_transaction_product["zbozi_ean"].str.contains('^978', regex=True) , 1, 0)

What do I do wrong please?

CodePudding user response:

This doesn't work because .str.contains will check if the regex occurs anywhere in the string.

If you insist on using regex, .str.match does what you want.

But for this simple case .str.startswith("978") is clearer.

CodePudding user response:

Apart from regex, you can use .loc to find cells that start with '978'. The code below will assign 1 to such cells in column 'A', just as an example:

df.loc[df['A'].astype(str).str[:3]=='978', 'A'] = 1

note: astype(str) converts the number to string and then str[:3] gets the first 3 characters, and then compares it to '978'.

  • Related