Home > Software engineering >  An average value for each item
An average value for each item

Time:10-07

It's required to find the average time of all delays time for each flight. Negative values should not be taken into account. How it's possible to do?

Dataset:

FlightNum ArrDelay DepDelay
1 10 14
1 -3 0
2 20 13
2 14 15
3 -3 -1
3 2 1
3 -10 -3
4 1 -2
4 11 9
5 10 8
5 20 11
5 -4 -1

Solution (by hands):

1: (10 14) / 2 = 12
2: (20 13 14 15) / 2 = 31
3: (2 1) / 3 = 0.66
4: (1 11 9) / 2 = 10.5
5: (10 8 20 11) / 3 = 16.3

The final result should be:

FlightNum AverageAllDelay
2 31
5 16.3
1 12
4 10.5
3 0.66

Tools: Pandas, Python.

Dataset (for your comfort):

data = {'FlightNum': {0: '1', 1: '1', 2: '2', 3: '2', 4: '3', 
                 5: '3', 6: '3', 7: '4', 8: '4', 9: '5', 
                 10: '5', 11: '5'}, 
        'ArrDelay': {0: 10, 1: -3, 2: 20, 3: 14, 4: -3, 5: 2, 6: -10, 7: 1, 8: 11, 
                9: 10, 10: 20, 11: -4}, 
        'DepDelay': {0: 14, 1: 0, 2: 13, 3: 15, 4: -1, 5: 1, 6: -3, 7: -2, 8: 9, 
                9: 8, 10: 11, 11: -1}}

CodePudding user response:

Use groupby to sum only positive values and then sort

(df.groupby('FlightNum').agg(lambda x: x[x>0].sum()/x.count())
   .sum(1).sort_values(ascending =False).reset_index())

  FlightNum          0
0         2  31.000000
1         5  16.333333
2         1  12.000000
3         4  10.500000
4         3   1.000000

CodePudding user response:

You can use pandas.DataFrame.mask to mask the negative values then pandas.core.groupby.GroupBy.mean to calculate the mean.

out = (
        df.mask(df.lt(0), 0)
          .assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
          .groupby('FlightNum', as_index=False)['AverageAllDelay'].mean()
        )

# Output :

print(out)

   FlightNum  AverageAllDelay
0          1        12.000000
1          2        31.000000
2          3         1.000000
3          4        10.500000
4          5        16.666667
  • Related