An error with Float64 values during sum: '<' not supported between instances of 's-CodePudding

The task in the following. It's necessary to find the mean time of all delays for each airline (carrier). Negative values shouldn't be used.

Dataset of flights:

Carrier ArrDelay DepDelay

Car1 10 14
Car1 -3 0
Car2 20 13
Car2 14 15
Car3 -3 -1
Car3 2 1
Car3 -10 -3

Solution (by hands):

Car1: (10   14) / 2 = 12
Car2: (20   13   14   15) / 2 = 31
Car3: (2   1) / 3 = 0.66

Output result should be:

Car2 31
Car1 12
Car3 0.66

One of the users proposed this good solution (maybe he will see it):

out = (
       df.mask(df.lt(0), 0)
          .assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
          .groupby('Carrier', as_index=False)['AverageAllDelay'].mean()
      )
print(out)

But, there is the error '<' not supported between instances of 'str' and 'int'. Initially, ArrDelay and DepDelay have float64 format from CSV file. I converted these values to integer with help of numerous ways from other Stackoverflow topics, but nothing helped. How to solve it?

Dataset (if necessary, but in CSV file all information about flight is written in one string):

data = {
        'Carrier ': {0: '1', 1: '1', 2: '2', 3: '2', 4: '3', 5: '3', 6: '3'}, 
        'ArrDelay': {0: 10, 1: -3, 2: 20, 3: 14, 4: -3, 5: 2, 6: -10}, 
        'DepDelay': {0: 14, 1: 0, 2: 13, 3: 15, 4: -1, 5: 1, 6: -3}
}

CodePudding user response：

The error is occurring because df.mask(df.lt(0), 0) performs an element wise comparison of your df to 0 and the 'Carrier' column of your df contains strings.

You can try df['Carrier'] = df['Carrier'].astype(int) and then your code runs without errors:

>>> out = (
...        df.mask(df.lt(0), 0)
...           .assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
...           .groupby('Carrier', as_index=False)['AverageAllDelay'].mean()
...       )
>>> out
   Carrier  AverageAllDelay
0        1             12.0
1        2             31.0
2        3              1.0