Home > Blockchain >  Group by to maintain pattern and keep count in each group in pandas {SQL DENSE RANK}
Group by to maintain pattern and keep count in each group in pandas {SQL DENSE RANK}

Time:08-19

df

ColA ColB ColC CodD ColF 
 1.    2.   3.   1.   T
 2.    2.   3.   1.   T
 3.    2.   3.   1.   F
 4.    2.   3.   1.   F
 5.    2.   3.   1.   T
 6.    2.   3.   2.   T
 7.    2.   3.   2.   T
 8.    2.   3.   2.   T
 9.    2.   3.   2.   F
 10.   2.   3.   2.   F
 11.   2.   3.   2.   F
 12.   2.   3.   2.   T
 13.   2.   3.   2.   T

desired output

 ColB ColC CodD ColF grp grpcount

 2.   3.   1.   T.    1.   2

 2.   3.   1.   F.    2.   2

 2.   3.   1.   T.    3.   1

 2.   3.   2.   T.    1.   3

 2.   3.   2.   F.    2.   3

 2.   3.   2.   T.    3.   2

I tried

df_2 = df.sort_values(['ColA'],ascending=True).groupby(['ColB','ColC','ColD','ColF'])['ColA'].count().reset_index(name='grpcount')

the T and F in each group returns just two groups but I want the pattern of each grouping. any help. So even though we have T , F in a group. the sequence should be counted and maintained.

recreating the data :

Data = {
        'ColA':['1','2','3','4','5','6','7','8','9','10','11','12','13'],
        'ColB':['2','2','2','2','2','2','2','2','2','2','2','2','2'],
        'ColC':['3','3','3','3','3','3','3','3','3','3','3','3','3'],
        'ColD':['1','1','1','1','1','2','2','2','2','2','2','2','2'],
        'ColF':['T','T','F','F','T','T','T','T','F','F','F','T','T']}

df = pd.DataFrame(Data,columns=['ColA','ColB','ColC','ColD','ColF'])
print(df)

CodePudding user response:

Does this work for you?

# shift ColF and flag changes
df["Shift"] = df.ColF != df.ColF.shift()

# cumsum the bools in Shift to flag when the sequence changes (i.e. make groups)
df["grp"] = df.groupby([
    "ColB",
    "ColC",
    "ColD",
])["Shift"].cumsum()

# groupby the cols and count!
out = df.groupby([
    "ColB",
    "ColC",
    "ColD",
    "ColF",
    "grp"
], as_index=False).ColA.count().rename(columns={
    "ColA": "grpcnt"
})

# sort to match desired output
out.sort_values([
    "ColB",
    "ColC",
    "ColD",
    "grp",
    "grpcnt",
    "ColF",
], inplace=True)

out

output

Reference: https://stackoverflow.com/a/53542712/6509519

  • Related