I have two lists
task=[1,1,1,1,2,2,3,4,5,5]
hours=[1,7,6,2,3,6,5,2,4,6]
-.suppose the value in index are 1,2,3,4,5 without diplicated. I want to. sum (the max of first four index in hours (because number 1 repeated 4 times in task), the max of 5,6 index in hours(because number 2 repeated 2 times in task), the max of 7 index in hours, the max of 8 index in hours, the max of 9,10 index in hours). . . So I need to find the sum of max each value in task with it index in hours
CodePudding user response:
You can create a dictionary/Dataframe from your lists and group by task and aggregate to sum:
import pandas as pd
task=[1,1,1,1,2,2,3,4,5,5]
hours=[1,7,6,2,3,6,5,2,4,6]
df = pd.DataFrame({'task': task, 'hours': hours})
print(df.groupby('task').agg(sum))
Output:
hours
task
1 16
2 9
3 5
4 2
5 10
Edit: Seems that I misunderstood the question.
You can use the same logic to find max values and then sum them:
max_val = df.groupby('task').agg(max)
print(int(max_val.sum()))
Output:
# max_val:
hours
task
1 7
2 6
3 5
4 2
5 6
# sum : 26