Home > Enterprise >  Calculate previous row within distinct groups
Calculate previous row within distinct groups

Time:08-05

For every person and change in fruit, how could I create a boolean which works out if for that person and group the purchased column has been filled at some point above

df
   person  fruit    purchased  time     has_purchased_already_filled_in_for_the_group
0  amy     apple     stall     10:00    False (this is start of new group so nothing above)
1  amy     apple     counter   10:01    True (because stall been filled at 10:00)
2  amy     apple     store     10:01    True  (because stall and counter been filled above)
3  amy     banana    online    10:02    False (this is start of new group so nothing above)
4  amy     banana              10:03    True (because online filled)
5  amy     apple               10:04    True  (this is start of new group so nothing above)        
6  amy     apple   inperson    10:05    False (because the 10.04 apple purchase is not filled in)
7  ben  ...

I'm struggling how to tell Python that the beginning and end apple group is distinct because bananas was bought in between

CodePudding user response:

is there an issue with an iteration for this? -> still needs an addition if the person changes. You can also df.apply this or np.vectorize this as function.

gr = ''
    for ind, val in df['fruit']:
        if ind == 0:
            gr = val
            df.at[ind, dest_col] = False
            continue
        if val == gr:
            df.at[ind, dest_col] = True
            continue
        if val != gr:
            gr = val
            df.at[ind, dest_col] = False

CodePudding user response:

You could identify changed groups (person or fruit) and assign a group number to those and then do a groupby on that.

Identify changed groups and number them

chg_grp = ((df['person'] != df['person'].shift()) | (df['fruit'] != df['fruit'].shift())).cumsum()

Groupby those groups and set has_purchased

def b(x):
    rc = ~x.ffill().isna().shift(fill_value=False)
    rc.iloc[0] = False
    return rc

dft = df.assign(has_purchased=df.groupby(chg_grp)['purchased'].transform(b))
print(dft)

Result

  person   fruit purchased   time  has_p  has_purchased
0    amy   apple     stall  10:00  False          False
1    amy   apple   counter  10:01   True           True
2    amy   apple     store  10:01   True           True
3    amy  banana    online  10:02  False          False
4    amy  banana       NaN  10:03   True           True
5    amy   apple       NaN  10:04   True          False
6    amy   apple  inperson  10:05  False          False

Note that has_p was from your original dataframe example. At 10:04, has_p is different than has_purchased. I am assuming that a True value for 10:04 was in error given your problem definition.

  • Related