I am using str.contains
in my dataframe to see if a certain value is inside the values of a Series.
Instead of the output being True
or False
, I want to see the actual value that I pass inside the contains.
A B
1 Fer
2 Ger
3 Tir
My expected output:
A B C
1 Fer er
2 Ger er
3 Tir Nan
Is there a built-in way to do this with pandas?
CodePudding user response:
Series.str.extract
is perfect for this:
df['C'] = df['B'].str.extract('(er)')
Output:
>>> df
A B C
0 1 Fer er
1 2 Ger er
2 3 Tir NaN
The parentheses in (er)
are important; they signify a capture group. If the regular expression within them matches any text, that matched text will be copied into the output column. If the regular expression doesn't match, NaN is copied to the output column. .str.extract
returns a dataframe with one column per capture group, so (er)(abc)(def)
would return a dataframe with 3 columns.