Home > Net >  How to get first and last value of each group in pandas with no group by column?
How to get first and last value of each group in pandas with no group by column?

Time:03-08

Hi Folks, I need to take first and last value from each group(where the counter value is 1 consecutively )

My Input :-

TIMESTAMP,COUNTER        
2019-03-19:13:50,0
2019-03-19:14:00,0
2019-03-19:14:10,0
2019-03-19:14:20,0
2019-03-19:14:30,0
2019-03-19:14:40,1
2019-03-19:14:50,1
2019-03-19:15:00,1
2019-03-19:15:10,0
2019-03-19:15:20,0
2019-03-19:15:30,0
2019-03-19:15:40,1
2019-03-19:15:50,1
2019-03-19:16:00,1

Desired Output :-

2019-03-19:14:40,2019-03-19:15:00
2019-03-19:15:40,2019-03-19:16:00

CodePudding user response:

You can aggregate by consecutive 1 values with aggregate minimal and maximal TIMESTAMP:

m = df['COUNTER'].ne(1)

df = (df[~m].groupby((m | m.shift()).cumsum())
            .agg(TIMESTAMP_min=('TIMESTAMP','min'), TIMESTAMP_max=('TIMESTAMP','max'))
            .reset_index(drop=True))
print (df)
      TIMESTAMP_min     TIMESTAMP_max
0  2019-03-19:14:40  2019-03-19:15:00
1  2019-03-19:15:40  2019-03-19:16:00

EDIT: Test groups:

print (df)

           TIMESTAMP  COUNTER
0   2019-03-19:13:50        0
1   2019-03-19:14:00        0
2   2019-03-19:14:10        0
3   2019-03-19:14:20        0
4   2019-03-19:14:30        0
5   2019-03-19:14:40        1
6   2019-03-19:14:50        1
7   2019-03-19:15:00        1
8   2019-03-19:15:10        0
9   2019-03-19:15:20        0
10  2019-03-19:15:30        0
11  2019-03-19:15:40        1
12  2019-03-19:15:50        1
13  2019-03-19:16:00        1


m = df['COUNTER'].ne(1)
print ((m | m.shift()).cumsum()[~m])
5      6
6      6
7      6
11    10
12    10
13    10
Name: COUNTER, dtype: int32

CodePudding user response:

You could use GroupBy.agg.

Assuming you have strings, and given your YYYY-MM-DD:HH:MM format, you can directly use min/max to get the first/last as string sorting will give you logical time sorting.

(df.loc[df['COUNTER'].eq(1), 'TIMESTAMP']
   .groupby(df['COUNTER'].diff().eq(1).cumsum())
   .agg(lambda x: ','.join((x.min(), x.max())))
)

output:

COUNTER
1    2019-03-19:14:40,2019-03-19:15:00
2    2019-03-19:15:40,2019-03-19:16:00
Name: TIMESTAMP, dtype: object
  • Related