Error subtracting string based on another column in pandas-CodePudding

I have 2 columns, I want to keep only the first part of the string based on another column as a delimeter.

Example: I would like to delete the colour, so I would need to delete from my "product name" everything after the "storage" column.

I'm getting an error using this code:

import pandas as pd

aux = [['iphone 64gb white','64gb'],['samsung 128gb blue','128gb']]

df = pd.DataFrame (aux, columns = ['product_name', 'storage'])

df['product_name'] = df['product_name'].str.split(df['storage']).str[0]

TypeError: unhashable type: 'Series'

CodePudding user response：

If you are going to use multiple columns, perhaps you can try using apply(). My proposed solution:

df['product_name'] = df.apply(lambda x: x['product_name'].split(x['storage'])[0],axis=1)

Outputs:

  product_name storage
0      iphone     64gb
1     samsung    128gb

Effectively removing the color, and keeping the first element of the list after applying the split() based on the storage column item as delimiter.

CodePudding user response：

I'd use tuples to iterate row by row.

df.assign(
    product_split=
    [pn.split(s, 1)[0].rstrip()
     for pn, s in zip(df['product_name'], df['storage'])]
)

         product_name storage product_split
0   iphone 64gb white    64gb        iphone
1  samsung 128gb blue   128gb       samsung

If you want to assign that thing back to the same column, go ahead and do:

df['product_name'] = [
    pn.split(s, 1)[0].rstrip()
    for pn, s in zip(df['product_name'], df['storage'])
]

df

  product_name storage
0       iphone    64gb
1      samsung   128gb