How to replace values by users correctly based on reference value?-CodePudding

I have a dataframe where for each user I have always 2 pairs by goods_type and business KPI for EACH USER such as:

user | amount   | goods_type | business
1    | $10      | used       | cost
1    | $30      | used       | revenue
1    | $15      | new        | cost
1    | $50      | new        | revenue

Now, I want to add a segment column using np.where in pandas but my logic applies only to rows where business = revenue. As far as the cost rows go, the segment value should be the same as the revenue for the SAME USER and SAME GOODS_TYPE pair, not based on the np.where clause. The segment value for all "revenue" rows is determined like this:

np.where( (df.amount < 15) & (df.goods_type =='used') , 'low', 'high' )

How would I do this so it looks like this:

user | amount   | goods_type | business | segment
1    | $10      | used       | cost     | low (this should be based on the segment value of the associated revenue row below!)
1    | $30      | used       | revenue. | low (this comes from the logic)
1    | $15      | new        | cost     | high (this should be based on the segment value of the associated revenue row below!)
1    | $50      | new        | revenue  | high (this comes from the logic)

CodePudding user response：

First, replace the dollar sign with blank. Use np.where to replace your values as you did before. Then set the segment values in business that are in the same row as cost to NaN values and fill them with the next value, using bfill.

df.amount=df.amount.str.replace('$','').astype(int)
df['segment'] = np.where(((df.amount <= 30) & (df.goods_type =='used')), 'low', 'high')
df.loc[df['business']=='cost', 'segment'] = np.nan
df.fillna(method='bfill', inplace=True)

Output:

    user    amount  goods_type  business    segment
0   1         10    used    cost    low
1   1         30    used    revenue low
2   1         15    new cost    high
3   1         50    new revenue high