Home > OS >  Remove string from one column if present in string of another column pandas
Remove string from one column if present in string of another column pandas

Time:02-11

I feel like I'm close but I am looking for something like this where the new column writes the company name without the city in it:

                           company  postal_code  name state         city  \
2000-01-01          abc gresham co        97080  john    mi      gresham   
2000-01-01             startup llc        97080  jeff    hi     portland   
2001-01-01  beaverton business biz        99999   sam    ca    beaverton   
2002-01-01                 andy co        92222  joey    or  los angeles   

                 new_col  
2000-01-01        abc co  
2000-01-01   startup llc  
2001-01-01  business biz  
2002-01-01       andy co  

This is what I have so far but it's throwing a TypeError: unhashable type: 'Series':

for idx in df1.index:
    if df1["city"].loc[idx] in df1['company'].loc[idx]:
        print("figure out how to print to new column the company name without the city included")
    else:
        print(df1['company'].loc[idx])

Thanks!

CodePudding user response:

Here's a solution:

df = (
    df.reset_index()
    .assign(new_col=df.reset_index()
        .pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
        .explode('x')
        .loc[lambda x: x['x'] != x['city'], 'x']
        .groupby(level=0)
        .agg(list)
        .str.join(' ')
    )
    .set_index('index')
)

Output:

>>> df
                           company  postal_code  name state         city       new_col
index                                                                                 
2000-01-01          abc gresham co        97080  john    mi      gresham        abc co
2000-01-01             startup llc        97080  jeff    hi     portland   startup llc
2001-01-01  beaverton business biz        99999   sam    ca    beaverton  business biz
2002-01-01                 andy co        92222  joey    or  los angeles       andy co

One-liner:

df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')
  • Related