Home > Software design >  pandas groupby and give a new id every n times
pandas groupby and give a new id every n times

Time:12-30

i have a dataframe like this:

df=pd.DataFrame({'id':[100,100,100,100,100,100,100,100,100,100,100,200,200,200,200,200,200]})
'''
    id
0   100
1   100
2   100
3   100
4   100
5   100
6   100
7   100
8   100
9   100
10  100
11  200
12  200
13  200
14  200
15  200

'''

I want to assign a new id every 5 entry using the id column. For example Expected output:

out=pd.DataFrame({'100_1','100_2','200_1'})
'''
    0
0   100_1  #100 is id. _1 is which group. (0-5)
1   100_2  #_2 is second period of 5. (5-10)
2   200_1

'''

For example if there are 15 of the same id (let's say id=400) the output should be like this: 400_1, 400_2, 400_3

CodePudding user response:

One not-so-elegant solution I can think of is using cumcount()

df['cumcount'] = df.groupby('id').cumcount('id')
cond = (df['cumcount'] - 1) % 5 == 0
(df[cond]['id'].astype(str)   '_'   (df[cond]['cumcount'] // 5   1).astype(str)).reset_index(drop=True)

CodePudding user response:

This could work:

# Set the flags when to create count
df['cumcount'] = df.groupby('id').cumcount('id')   1
df["val"] = np.where(df["cumcount"] % 5 == 0, 1, np.nan)

# Create the new df
df1 = df.dropna().copy().reset_index(drop=True)
df1["cumsum"] = df1.groupby('id')["val"].cumsum().astype(int)
df1["result"] = df1["id"].astype(str)   "_"    df1["cumsum"].astype(str)
  • Related