I'm looking to create/update a new column, 'dept' if the text in column A contains a string. It's working without a forloop involved but when I try to iterate it is setting the default instead of the detected value.
Surely I shouldn't manually add the same line 171 times, I've scoured the internet and SO for possible hints and or solutions and can't seem to locate any good info.
Working Code:
depts = ['PHYS', 'PSYCH']
df['dept'] = np.where(df.a.str.contains("PHYS"), "PHYS", "Unknown")
df['dept'] = np.where(df.a.str.contains("PSYCH"), "PSYCH", "Unknown")
But when I try:
for dept in depts:
df['dept'] = np.where(df.a.str.contains(dept), dept, "Unknown")
print(dept)
I get all "Unknowns" but properly prints out each dept. I've also tried to make sure dept is fed in as a string by explicitly stating dept = str(dept)
to no avail.
Thanks in advance for any and all help. I feel like this is a simple issue that should be easily sorted but I'm experiencing a block.
CodePudding user response:
We usually do
df['dept'] = df.a.str.findall('|'.join(depts)).str[0]
CodePudding user response:
I prefer str.extract
:
df['depth'] = df['a'].str.extract(f"({'|'.join(depts)})").fillna("Unknown")
Or:
df['depth'] = df['a'].str.extract('(' '|'.join(depts) ')').fillna("Unknown")
Both codes output:
>>> df
a depth
0 ewfefPHYS PHYS
1 QWQiPSYCH PSYCH
2 fwfew Unknown
>>>