How do I filter a 2d array and keep only those elements that are meeting the condition that if there are 2 clicks coming one after another and then tocart, filter the array from the first click Example
df = pd.DataFrame({
'a': ['Jason', 'Jason', 'Boby', 'Boby', 'Boby','Boby','Boby','Cob'],
'b': [1, 2, 5, 5, 4,2,1, 6],
'c': ['x', 'y', 'z', 'x', 'y','d', 'd','z'],
'd': ['click', 'click', 'tocart', 'click', 'tocart','click','click', 'tocart']
})
df = df.groupby(["a"]).apply(lambda x: x.sort_values(["b"], ascending = True)).reset_index(drop=True)
df['combine'] = df[['b','c','d']].values.tolist()
df = df[['a','combine']].groupby('a').agg(pd.Series.tolist).reset_index()
df
In case of Boby
a | combine |
---|---|
Boby | [[1, d, click],[2, d, click], [4, y, tocart], [5, x, click],[5, z, tocart]] |
Cob | [[6, z, tocart]] |
I want to lose the first click from the array bc after it comes one more click and then tocart. Cob shoulb not be in the outcome df as there is no "click" in his array and Jason has no click in his array.
the outcome I expect
a | combine |
---|---|
Boby | [[2, d, click], [4, y, tocart], [5, x, click],[5, z, tocart]] |
CodePudding user response:
Would something like this work? Basically does more or less what you describe:
def slicing(y):
x = y[y['d'].shift() != y['d']].to_numpy()
if np.isin(['click', 'tocart'], x[:,-1]).all():
return x
else:
return np.nan
out = df.sort_values(by='b').groupby('a').apply(slicing).dropna()
Output:
a
Boby [[5, z, click], [5, x, tocart]]
dtype: object