I wish to add a column to my document indicating whether or not my regex was matched in another column. Such as to go from:
Column A |
---|
word regex word |
word word word |
word word word |
word regex word |
to
Column A | Column B |
---|---|
word regex word | True |
word word word | False |
word word word | False |
word regex word | True |
I doubled checked my regex and it works just fine, so the problem does not come from that.
I tried
- iterating over the rows and changing them depending on whether the regex is matched
for row in FILE.itertuples():
if FILE.COLUMNTOSEARCH.contains(REGEX):
FILE.at[row.Index, "NEWCOLUMN"] = "string1"
else:
FILE.at[row.Index, "NEWCOLUMN"] = "string2"
This returns the error: "AttributeError: 'Series' object has no attribute 'contains'"
- duplicating the first column and then using replace
FILE.replace(REGEX, regex=True, value="string1", inplace=True)
FILE.replace(REGEX, regex=False, value="string2", inplace=True)
For this, only the "string1" appears, and it doesnt replace the whole row, just where the regex is found although I wish to for "string1" to be the only string in the entry.
I've looked at all the stackoverflow possible documentation without being able to figure anything. I feel like both those solutions are highly inefficient but cannot understand how to write something better. Thanks in advance for any help/solution.
CodePudding user response:
You can use .str.contains
:
df["Column B"] = df["Column A"].str.contains(r"\bregex\b")
This outputs:
Column A Column B
0 word regex word True
1 word word word False
2 word word word False
3 word regex word True