Hi I am searching only the exact substring from string column and return True/False.
Row-3,4,5 has sting 'abc' (case-sensitive) but when i tried to return it returns TRUE for all rows
Below is code i have tried.
df['try_output'] = df['String1'].str.contains('ABC',case = False)
Is there any modification in above statement to get output column 'Required_Output'
CodePudding user response:
I don't think str.contains
is what you are looking for here, rather, you are looking for an exact match that will not consider upper / lower cases. Therefore, you can simply convert to upper, str.upper()
, and check whether it equals to 'ABC':
df['output'] = df.string_1.str.upper() == 'ABC'
print(df)
string_1 output
0 ABC True
1 abc True
2 XYZabc False
3 XyzABC False
4 ABCqqqq False
5 AbC True
6 aBC True
It's logical why your code returns everything TRUE
- all of your rows contain 'abc', especially when you specify not to care about upper cases (case = False
)
CodePudding user response:
Use str.fullmatch
(Pandas >= 1.1.0
) without any conversion:
df['output'] = df['string_1'].str.fullmatch('abc', case=False)
print(df)
# Output:
string_1 output
0 ABC True
1 abc True
2 XYZabc False
3 XyzABC False
4 ABCqqqq False
5 AbC True
6 aBC True