It's required to find the average time of all delays time for each flight. Negative values should not be taken into account. How it's possible to do?
Dataset:
FlightNum ArrDelay DepDelay
1 10 14
1 -3 0
2 20 13
2 14 15
3 -3 -1
3 2 1
3 -10 -3
4 1 -2
4 11 9
5 10 8
5 20 11
5 -4 -1
Solution (by hands):
1: (10 14) / 2 = 12
2: (20 13 14 15) / 2 = 31
3: (2 1) / 3 = 0.66
4: (1 11 9) / 2 = 10.5
5: (10 8 20 11) / 3 = 16.3
The final result should be:
FlightNum AverageAllDelay
2 31
5 16.3
1 12
4 10.5
3 0.66
Tools: Pandas, Python.
Dataset (for your comfort):
data = {'FlightNum': {0: '1', 1: '1', 2: '2', 3: '2', 4: '3',
5: '3', 6: '3', 7: '4', 8: '4', 9: '5',
10: '5', 11: '5'},
'ArrDelay': {0: 10, 1: -3, 2: 20, 3: 14, 4: -3, 5: 2, 6: -10, 7: 1, 8: 11,
9: 10, 10: 20, 11: -4},
'DepDelay': {0: 14, 1: 0, 2: 13, 3: 15, 4: -1, 5: 1, 6: -3, 7: -2, 8: 9,
9: 8, 10: 11, 11: -1}}
CodePudding user response:
Use groupby
to sum
only positive values and then sort
(df.groupby('FlightNum').agg(lambda x: x[x>0].sum()/x.count())
.sum(1).sort_values(ascending =False).reset_index())
FlightNum 0
0 2 31.000000
1 5 16.333333
2 1 12.000000
3 4 10.500000
4 3 1.000000
CodePudding user response:
You can use pandas.DataFrame.mask
to mask the negative values then pandas.core.groupby.GroupBy.mean
to calculate the mean.
out = (
df.mask(df.lt(0), 0)
.assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
.groupby('FlightNum', as_index=False)['AverageAllDelay'].mean()
)
# Output :
print(out)
FlightNum AverageAllDelay
0 1 12.000000
1 2 31.000000
2 3 1.000000
3 4 10.500000
4 5 16.666667