Home > Blockchain >  counting rows by sampling in a pandas dataframe
counting rows by sampling in a pandas dataframe

Time:02-16

I have a pandas dataframe of over 1000 lines, where I want to read values from 10 rows at a time. eg - I want to calculate the number of times Logic is 1 for the first 10 rows, then for the next 10 rows and so on.

Time Logic
1 0
2 1
3 0
4 0
. .
. .
. .
997 1
998 0
999 0

CodePudding user response:

If you just need an array, use the underlying numpy array:

df['Logic'].to_numpy().reshape(-1,10).sum(1)

output:

array([8, 3, 6, 6, 7, 6, 3, 5, 7, 5, 4, 3, 9, 3, 7, 5, 4, 4, 3, 3, 4, 4,
       5, 5, 5, 7, 6, 5, 5, 5, 7, 8, 6, 7, 4, 5, 4, 6, 6, 7, 6, 7, 2, 6,
       6, 3, 3, 7, 4, 6, 5, 4, 5, 4, 3, 4, 9, 7, 4, 3, 5, 6, 5, 4, 5, 6,
       6, 8, 4, 4, 7, 7, 5, 4, 3, 5, 7, 4, 3, 3, 5, 3, 5, 8, 6, 5, 6, 6,
       5, 5, 7, 3, 3, 4, 7, 2, 2, 4, 6, 1])

With pandas, you could also use groupby sum:

df.groupby(df.index//10)['Logic'].sum()

or, if you don't have a range index:

import numpy as np
df.groupby(np.arange(len(df))//10)['Logic'].sum()

Example:

# reproducible input
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'Time': np.arange(1000) 1,
                   'Logic': np.random.choice([0,1], 1000)})

# calculation
N = 10
out = df.groupby(df.index//N)['Logic'].sum()

output:

0     8
1     3
2     6
3     6
4     7
     ..
95    2
96    2
97    4
98    6
99    1
  • Related