i have a dataframe like this:
df=pd.DataFrame({'id':[100,100,100,100,100,100,100,100,100,100,100,200,200,200,200,200,200]})
'''
id
0 100
1 100
2 100
3 100
4 100
5 100
6 100
7 100
8 100
9 100
10 100
11 200
12 200
13 200
14 200
15 200
'''
I want to assign a new id every 5 entry using the id column. For example Expected output:
out=pd.DataFrame({'100_1','100_2','200_1'})
'''
0
0 100_1 #100 is id. _1 is which group. (0-5)
1 100_2 #_2 is second period of 5. (5-10)
2 200_1
'''
For example if there are 15 of the same id (let's say id=400) the output should be like this:
400_1, 400_2, 400_3
CodePudding user response:
One not-so-elegant solution I can think of is using cumcount()
df['cumcount'] = df.groupby('id').cumcount('id')
cond = (df['cumcount'] - 1) % 5 == 0
(df[cond]['id'].astype(str) '_' (df[cond]['cumcount'] // 5 1).astype(str)).reset_index(drop=True)
CodePudding user response:
This could work:
# Set the flags when to create count
df['cumcount'] = df.groupby('id').cumcount('id') 1
df["val"] = np.where(df["cumcount"] % 5 == 0, 1, np.nan)
# Create the new df
df1 = df.dropna().copy().reset_index(drop=True)
df1["cumsum"] = df1.groupby('id')["val"].cumsum().astype(int)
df1["result"] = df1["id"].astype(str) "_" df1["cumsum"].astype(str)