Home > Software engineering >  An average value for each item
An average value for each item


It's required to find the average time of all delays time for each flight. Negative values should not be taken into account. How it's possible to do?


FlightNum ArrDelay DepDelay
1 10 14
1 -3 0
2 20 13
2 14 15
3 -3 -1
3 2 1
3 -10 -3
4 1 -2
4 11 9
5 10 8
5 20 11
5 -4 -1

Solution (by hands):

1: (10 14) / 2 = 12
2: (20 13 14 15) / 2 = 31
3: (2 1) / 3 = 0.66
4: (1 11 9) / 2 = 10.5
5: (10 8 20 11) / 3 = 16.3

The final result should be:

FlightNum AverageAllDelay
2 31
5 16.3
1 12
4 10.5
3 0.66

Tools: Pandas, Python.

Dataset (for your comfort):

data = {'FlightNum': {0: '1', 1: '1', 2: '2', 3: '2', 4: '3', 
                 5: '3', 6: '3', 7: '4', 8: '4', 9: '5', 
                 10: '5', 11: '5'}, 
        'ArrDelay': {0: 10, 1: -3, 2: 20, 3: 14, 4: -3, 5: 2, 6: -10, 7: 1, 8: 11, 
                9: 10, 10: 20, 11: -4}, 
        'DepDelay': {0: 14, 1: 0, 2: 13, 3: 15, 4: -1, 5: 1, 6: -3, 7: -2, 8: 9, 
                9: 8, 10: 11, 11: -1}}

CodePudding user response:

Use groupby to sum only positive values and then sort

(df.groupby('FlightNum').agg(lambda x: x[x>0].sum()/x.count())
   .sum(1).sort_values(ascending =False).reset_index())

  FlightNum          0
0         2  31.000000
1         5  16.333333
2         1  12.000000
3         4  10.500000
4         3   1.000000

CodePudding user response:

You can use pandas.DataFrame.mask to mask the negative values then pandas.core.groupby.GroupBy.mean to calculate the mean.

out = (
        df.mask(df.lt(0), 0)
          .assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
          .groupby('FlightNum', as_index=False)['AverageAllDelay'].mean()

# Output :


   FlightNum  AverageAllDelay
0          1        12.000000
1          2        31.000000
2          3         1.000000
3          4        10.500000
4          5        16.666667
  • Related