date consumption customer_id
2018-01-01 12 111
2018-01-02 12 111
*2018-01-03* 14 111
*2018-01-05* 12 111
2018-01-06 45 111
2018-01-07 34 111
2018-01-01 23 112
2018-01-02 23 112
2018-01-03 45 112
2018-01-04 34 112
2018-01-05 23 112
2018-01-06 34 112
2018-01-01 23 113
2018-01-02 34 113
2018-01-03 45 113
2018-01-04 34 113
The values in customer 111 is not continuous, it has missing value in 2018-01-04, so i want to remove all 111 from my dataframe in pandas.
date consumption customer_id
2018-01-01 23 112
2018-01-02 23 112
2018-01-03 45 112
2018-01-04 34 112
2018-01-05 23 112
2018-01-06 34 112
2018-01-01 23 113
2018-01-02 34 113
2018-01-03 45 113
2018-01-04 34 113
i want result like this ? how does it possible in pandas?
CodePudding user response:
You can compute the successive delta and check if any is greater than 1d:
drop = (pd.to_datetime(df['date'])
.groupby(df['customer_id'])
.apply(lambda s: s.diff().gt('1d').any())
)
out = df[df['customer_id'].isin(drop[~drop].index)]
Or with groupby.filter
:
df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['customer_id'])
.filter(lambda d: ~d['date'].diff().gt('1d').any())
)
Output:
date consumption customer_id
6 2018-01-01 23 112
7 2018-01-02 23 112
8 2018-01-03 45 112
9 2018-01-04 34 112
10 2018-01-05 23 112
11 2018-01-06 34 112
12 2018-01-01 23 113
13 2018-01-02 34 113
14 2018-01-03 45 113
15 2018-01-04 34 113
If you the dates are not necessarily increasing, also check you cannot go back in time:
df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['customer_id'])
.filter(lambda d: d['date'].diff().iloc[1:].eq('1d').all())
)