Home > front end >  Pandas: Updating Column B value if A contains string
Pandas: Updating Column B value if A contains string

Time:10-20

I'm looking to create/update a new column, 'dept' if the text in column A contains a string. It's working without a forloop involved but when I try to iterate it is setting the default instead of the detected value.

Surely I shouldn't manually add the same line 171 times, I've scoured the internet and SO for possible hints and or solutions and can't seem to locate any good info.

Working Code:

depts = ['PHYS', 'PSYCH']
df['dept'] = np.where(df.a.str.contains("PHYS"), "PHYS", "Unknown")
df['dept'] = np.where(df.a.str.contains("PSYCH"), "PSYCH", "Unknown")

But when I try:

for dept in depts:
    df['dept'] = np.where(df.a.str.contains(dept), dept, "Unknown")
    print(dept)

I get all "Unknowns" but properly prints out each dept. I've also tried to make sure dept is fed in as a string by explicitly stating dept = str(dept) to no avail.

Thanks in advance for any and all help. I feel like this is a simple issue that should be easily sorted but I'm experiencing a block.

CodePudding user response:

We usually do

df['dept'] = df.a.str.findall('|'.join(depts)).str[0]

CodePudding user response:

I prefer str.extract:

df['depth'] = df['a'].str.extract(f"({'|'.join(depts)})").fillna("Unknown")

Or:

df['depth'] = df['a'].str.extract('('   '|'.join(depts)   ')').fillna("Unknown")

Both codes output:

>>> df
           a    depth
0  ewfefPHYS     PHYS
1  QWQiPSYCH    PSYCH
2      fwfew  Unknown
>>> 
  • Related