counting rows by sampling in a pandas dataframe-CodePudding

I have a pandas dataframe of over 1000 lines, where I want to read values from 10 rows at a time. eg - I want to calculate the number of times Logic is 1 for the first 10 rows, then for the next 10 rows and so on.

Time	Logic
1	0
2	1
3	0
4	0
.	.
.	.
.	.
997	1
998	0
999	0

CodePudding user response：

If you just need an array, use the underlying numpy array:

df['Logic'].to_numpy().reshape(-1,10).sum(1)

output:

array([8, 3, 6, 6, 7, 6, 3, 5, 7, 5, 4, 3, 9, 3, 7, 5, 4, 4, 3, 3, 4, 4,
       5, 5, 5, 7, 6, 5, 5, 5, 7, 8, 6, 7, 4, 5, 4, 6, 6, 7, 6, 7, 2, 6,
       6, 3, 3, 7, 4, 6, 5, 4, 5, 4, 3, 4, 9, 7, 4, 3, 5, 6, 5, 4, 5, 6,
       6, 8, 4, 4, 7, 7, 5, 4, 3, 5, 7, 4, 3, 3, 5, 3, 5, 8, 6, 5, 6, 6,
       5, 5, 7, 3, 3, 4, 7, 2, 2, 4, 6, 1])

With pandas, you could also use groupby sum:

df.groupby(df.index//10)['Logic'].sum()

or, if you don't have a range index:

import numpy as np
df.groupby(np.arange(len(df))//10)['Logic'].sum()

Example:

# reproducible input
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'Time': np.arange(1000) 1,
                   'Logic': np.random.choice([0,1], 1000)})

# calculation
N = 10
out = df.groupby(df.index//N)['Logic'].sum()

output: