I have a dataframe where for each user I have always 2 pairs by goods_type and business KPI for EACH USER such as:
user | amount | goods_type | business
1 | $10 | used | cost
1 | $30 | used | revenue
1 | $15 | new | cost
1 | $50 | new | revenue
Now, I want to add a segment column using np.where in pandas but my logic applies only to rows where business = revenue. As far as the cost rows go, the segment value should be the same as the revenue for the SAME USER and SAME GOODS_TYPE pair, not based on the np.where clause. The segment value for all "revenue" rows is determined like this:
np.where( (df.amount < 15) & (df.goods_type =='used') , 'low', 'high' )
How would I do this so it looks like this:
user | amount | goods_type | business | segment
1 | $10 | used | cost | low (this should be based on the segment value of the associated revenue row below!)
1 | $30 | used | revenue. | low (this comes from the logic)
1 | $15 | new | cost | high (this should be based on the segment value of the associated revenue row below!)
1 | $50 | new | revenue | high (this comes from the logic)
CodePudding user response:
First, replace the dollar sign with blank. Use np.where
to replace your values as you did before. Then set the segment values in business that are in the same row as cost to NaN values and fill them with the next value, using bfill
.
df.amount=df.amount.str.replace('$','').astype(int)
df['segment'] = np.where(((df.amount <= 30) & (df.goods_type =='used')), 'low', 'high')
df.loc[df['business']=='cost', 'segment'] = np.nan
df.fillna(method='bfill', inplace=True)
Output:
user amount goods_type business segment
0 1 10 used cost low
1 1 30 used revenue low
2 1 15 new cost high
3 1 50 new revenue high