In pandas dataframe, I'm trying to generate , 2 counters in a df.groupby on type_pgm :
=> counter1 looping from 1 to 3
=> counter2 counts everytime counter1 restart to 1
Here is a df_sample :
df_sample = pd.DataFrame([[1,"pgm_1","type_A"],
[1,"pgm_1","type_A"],
[1,"pgm_1","type_A"],
[2,"pgm_2","type_A"],
[2,"pgm_2","type_A"],
[2,"pgm_2","type_A"],
[2,"pgm_2","type_A"],
[2,"pgm_2","type_A"],
[3,"pgm_3","type_B"],
[3,"pgm_3","type_B"],
[3,"pgm_3","type_B"],
[3,"pgm_3","type_B"],
[4,"pgm_4","type_B"]
])
df_sample.columns = ["pgm_id","pgm_name","pgm_type"]
print(df_sample)
pgm_id pgm_name pgm_type
0 1 pgm_1 type_A
1 1 pgm_1 type_A
2 1 pgm_1 type_A
3 2 pgm_2 type_A
4 2 pgm_2 type_A
5 2 pgm_2 type_A
6 2 pgm_2 type_A
7 2 pgm_2 type_A
8 3 pgm_3 type_B
9 3 pgm_3 type_B
10 3 pgm_3 type_B
11 3 pgm_3 type_B
12 4 pgm_4 type_B
Here is the targeted df_target :
df_target = pd.DataFrame([[1,"pgm_1","type_A",1,1],
[1,"pgm_1","type_A",2,1],
[1,"pgm_1","type_A",3,1],
[2,"pgm_2","type_A",1,2],
[2,"pgm_2","type_A",2,2],
[2,"pgm_2","type_A",3,2],
[2,"pgm_2","type_A",1,3],
[2,"pgm_2","type_A",2,3],
[3,"pgm_3","type_B",1,1],
[3,"pgm_3","type_B",2,1],
[3,"pgm_3","type_B",3,1],
[3,"pgm_3","type_B",1,2],
[4,"pgm_4","type_B",2,2]
])
df_target.columns = ["pgm_id","pgm_name","pgm_type","counter1","counter2"]
print(df_target)
pgm_id pgm_name pgm_type counter1 counter2
0 1 pgm_1 type_A 1 1
1 1 pgm_1 type_A 2 1
2 1 pgm_1 type_A 3 1
3 2 pgm_2 type_A 1 2
4 2 pgm_2 type_A 2 2
5 2 pgm_2 type_A 3 2
6 2 pgm_2 type_A 1 3
7 2 pgm_2 type_A 2 3
8 3 pgm_3 type_B 1 1
9 3 pgm_3 type_B 2 1
10 3 pgm_3 type_B 3 1
11 3 pgm_3 type_B 1 2
12 4 pgm_4 type_B 1 2
I tried many waves unsuccessfull...
Thanks a lot in advance for your help
CodePudding user response:
I hope I've understood your question right:
df_sample["counter1"] = (
df_sample.groupby(["pgm_type", "pgm_name"])
.apply(lambda x: pd.Series(i % 3 1 for i in range(len(x))))
.values
)
df_sample["counter2"] = (
df_sample.groupby("pgm_type")
.apply(lambda x: pd.Series(i // 3 1 for i in range(len(x))))
.values
)
print(df_sample)
Prints:
pgm_id pgm_name pgm_type counter1 counter2
0 1 pgm_1 type_A 1 1
1 1 pgm_1 type_A 2 1
2 1 pgm_1 type_A 3 1
3 2 pgm_2 type_A 1 2
4 2 pgm_2 type_A 2 2
5 2 pgm_2 type_A 3 2
6 2 pgm_2 type_A 1 3
7 2 pgm_2 type_A 2 3
8 3 pgm_3 type_B 1 1
9 3 pgm_3 type_B 2 1
10 3 pgm_3 type_B 3 1
11 3 pgm_3 type_B 1 2
12 4 pgm_4 type_B 1 2