Home > Back-end >  Count values of each row in pandas dataframe only for consecutive numbers
Count values of each row in pandas dataframe only for consecutive numbers

Time:11-08

I got a pandas dataframe that looks like this: enter image description here

I want to count how many rows are for each id and print the result. The problem is I want to count that ONLY for consecutive numbers in "frame num".

For example: if frame num is: [1,2,3,45,47,122,123,124,125] and id is [1,1,1,1,1,1,1,1,1] it should print: 3 1 1 4 (and do that for EACH id).

Is there any way to do that? I got crazy trying to figure it out! To count rows for each id should be enought to use a GROUP BY. But with this new condition its difficult.

CodePudding user response:

You can use pandas.DataFrame.shift() for finding consecutive numbers then use itertools.groupby for creating a list of counting consecutive.

import pandas as pd
from itertools import chain
from itertools import groupby

# Example input dataframe
df = pd.DataFrame({
    'num' : [1,2,3,45,47,122,123,124,125,1,2,3,45,47,122,123,124,125], 
    'id' : [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2]
})

df['s'] = (df['num']-1 == df['num'].shift()) | (df['num'] 1 == df['num'].shift(-1))

res = df.groupby('id')['s'].apply(lambda g: list(chain.from_iterable([[len(list(group))] if key else [1]*len(list(group))
                                                                      for key, group in groupby( g )])))

print(res)

Output:

id
1    [3, 1, 1, 4]
2    [3, 1, 1, 4]
Name: s, dtype: object
  • Related