Home > Software engineering >  Error subtracting string based on another column in pandas
Error subtracting string based on another column in pandas

Time:03-24

I have 2 columns, I want to keep only the first part of the string based on another column as a delimeter.

Example: I would like to delete the colour, so I would need to delete from my "product name" everything after the "storage" column.

I'm getting an error using this code:

import pandas as pd

aux = [['iphone 64gb white','64gb'],['samsung 128gb blue','128gb']]

df = pd.DataFrame (aux, columns = ['product_name', 'storage'])

df['product_name'] = df['product_name'].str.split(df['storage']).str[0]

TypeError: unhashable type: 'Series'

CodePudding user response:

If you are going to use multiple columns, perhaps you can try using apply(). My proposed solution:

df['product_name'] = df.apply(lambda x: x['product_name'].split(x['storage'])[0],axis=1)

Outputs:

  product_name storage
0      iphone     64gb
1     samsung    128gb

Effectively removing the color, and keeping the first element of the list after applying the split() based on the storage column item as delimiter.

CodePudding user response:

I'd use tuples to iterate row by row.

df.assign(
    product_split=
    [pn.split(s, 1)[0].rstrip()
     for pn, s in zip(df['product_name'], df['storage'])]
)

         product_name storage product_split
0   iphone 64gb white    64gb        iphone
1  samsung 128gb blue   128gb       samsung

If you want to assign that thing back to the same column, go ahead and do:

df['product_name'] = [
    pn.split(s, 1)[0].rstrip()
    for pn, s in zip(df['product_name'], df['storage'])
]

df

  product_name storage
0       iphone    64gb
1      samsung   128gb
  • Related