Pandas: count number of times between specific range-CodePudding

I have a dataset of part numbers, and for each of those part numbers, they were replaced at a certain cycle count. For example, in the below table is an example of my data, first column being the part number, and the second being the cycle count it was replaced (ie: part abc was replaced at 100 cycles, and then again at 594, and then at 1230, and 2291):

Part #	Cycle Count
abc	100
abc	594
abc	1230
abc	2291
def	329
def	2001
ghi	1671
jkl	29
jkl	190
mno	700
mno	1102
pqr	2991

With this data, I am trying to create a new table that counts the number of times a part was replaced within certain cycle ranges, and create a table such as the example below:

Part #	Cycle Count Range (1-1000)	Cycle Count Range (1001-2000)	Cycle Count Range (2001-3000)
abc	2	1	1
def	1	0	1
ghi	0	1	0
jkl	2	0	0
mno	1	1	0
pqr	0	0	1

I tried doing this in SQL but I am not proficient enough to do it.

CodePudding user response：

We can use np.arange to create some Cycle Count Range bins and pd.cut to assign the values of Cycle Count to said bins.

from io import StringIO
import numpy as np
import pandas as pd


df = pd.read_csv(StringIO("""Part # Cycle Count
abc 100
abc 594
abc 1230
abc 2291
def 329
def 2001
ghi 1671
jkl 29
jkl 190
mno 700
mno 1102
pqr 2991"""), sep="\\t ")

# make bins of size 1_000 using numpy.arange
bins = np.arange(0, df["Cycle Count"].max() 1_000, step=1_000)

# bin the Cycle Count series
df["Cycle Count Range"] = pd.cut(df["Cycle Count"], bins, retbins=False)

# count the Cycle Counts within the Part #/Cycle Count Range groups
out = df.pivot_table(
    values="Cycle Count",
    index="Part #",
    columns="Cycle Count Range",
    aggfunc="count"
)

print(out)

Cycle Count Range  (0, 1000]  (1000, 2000]  (2000, 3000]
Part #                                                  
abc                        2             1             1
def                        1             0             1
ghi                        0             1             0
jkl                        2             0             0
mno                        1             1             0
pqr                        0             0             1

CodePudding user response：

With crosstab and interval_range:

#This is number of periods
p = math.ceil((df['Cycle Count'].max() - df['Cycle Count'].min())/1000)

#These are bins in which pd.cut needs to cut the series into
b = pd.interval_range(start=1, freq=1000, periods=p, closed='neither')

#Then cut the series
df['Cycle Count Range'] = pd.cut(df['Cycle Count'], b)

#Do a crosstab to compute the aggregation.
out = pd.crosstab(df['Part#'], df['Cycle Count Range'])

print(out):

Cycle Count Range  (1, 1001)  (1001, 2001)  (2001, 3001)
Part#                                                   
abc                        2             1             1
def                        1             0             0
ghi                        0             1             0
jkl                        2             0             0
mno                        1             1             0
pqr                        0             0             1