I have 2 columns, I want to keep only the first part of the string based on another column as a delimeter.
Example: I would like to delete the colour, so I would need to delete from my "product name" everything after the "storage" column.
I'm getting an error using this code:
import pandas as pd
aux = [['iphone 64gb white','64gb'],['samsung 128gb blue','128gb']]
df = pd.DataFrame (aux, columns = ['product_name', 'storage'])
df['product_name'] = df['product_name'].str.split(df['storage']).str[0]
TypeError: unhashable type: 'Series'
CodePudding user response:
If you are going to use multiple columns, perhaps you can try using apply()
. My proposed solution:
df['product_name'] = df.apply(lambda x: x['product_name'].split(x['storage'])[0],axis=1)
Outputs:
product_name storage
0 iphone 64gb
1 samsung 128gb
Effectively removing the color, and keeping the first element of the list after applying the split()
based on the storage
column item as delimiter.
CodePudding user response:
I'd use tuples to iterate row by row.
df.assign(
product_split=
[pn.split(s, 1)[0].rstrip()
for pn, s in zip(df['product_name'], df['storage'])]
)
product_name storage product_split
0 iphone 64gb white 64gb iphone
1 samsung 128gb blue 128gb samsung
If you want to assign that thing back to the same column, go ahead and do:
df['product_name'] = [
pn.split(s, 1)[0].rstrip()
for pn, s in zip(df['product_name'], df['storage'])
]
df
product_name storage
0 iphone 64gb
1 samsung 128gb