I have the following dataframe:
Group from to
1 2 1
1 1 2
1 3 2
1 3 1
2 1 4
2 3 1
2 1 2
2 3 1
I want create a 4th column that counts the of unique combinations (from, to) within each group and drops any repeated combination within each group (leaves only one)
Expected output:
Group from to weight
1 2 1 1
1 1 2 1
1 3 2 1
1 3 1 1
2 1 4 1
2 3 1 2
2 1 2 1
In the expected output, the 2nd from 3, to 1 row in group 2 was dropped because it is a duplicate.
CodePudding user response:
In your case we just need groupby
with size
out = df.groupby(df.columns.tolist()).size().to_frame(name='weight').reset_index()
Out[258]:
Group from to weight
0 1 1 2 1
1 1 2 1 1
2 1 3 1 1
3 1 3 2 1
4 2 1 2 1
5 2 1 4 1
6 2 3 1 2
CodePudding user response:
You can group by the 3 columns using .groupby()
and take their size by GroupBy.size()
, as follows:
df_out = df.groupby(['Group', 'from', 'to'], sort=False).size().reset_index(name='weight')
Result:
print(df_out)
Group from to weight
0 1 2 1 1
1 1 1 2 1
2 1 3 2 1
3 1 3 1 1
4 2 1 4 1
5 2 3 1 2
6 2 1 2 1