The task in the following. It's necessary to find the mean time of all delays for each airline (carrier). Negative values shouldn't be used.
Dataset of flights:
Carrier ArrDelay DepDelay
Car1 10 14
Car1 -3 0
Car2 20 13
Car2 14 15
Car3 -3 -1
Car3 2 1
Car3 -10 -3
Solution (by hands):
Car1: (10 14) / 2 = 12
Car2: (20 13 14 15) / 2 = 31
Car3: (2 1) / 3 = 0.66
Output result should be:
Car2 31
Car1 12
Car3 0.66
One of the users proposed this good solution (maybe he will see it):
out = (
df.mask(df.lt(0), 0)
.assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
.groupby('Carrier', as_index=False)['AverageAllDelay'].mean()
)
print(out)
But, there is the error '<' not supported between instances of 'str' and 'int'
.
Initially, ArrDelay
and DepDelay
have float64
format from CSV file. I converted these values to integer with help of numerous ways from other Stackoverflow topics, but nothing helped. How to solve it?
Dataset (if necessary, but in CSV file all information about flight is written in one string):
data = {
'Carrier ': {0: '1', 1: '1', 2: '2', 3: '2', 4: '3', 5: '3', 6: '3'},
'ArrDelay': {0: 10, 1: -3, 2: 20, 3: 14, 4: -3, 5: 2, 6: -10},
'DepDelay': {0: 14, 1: 0, 2: 13, 3: 15, 4: -1, 5: 1, 6: -3}
}
CodePudding user response:
The error is occurring because df.mask(df.lt(0), 0)
performs an element wise comparison of your df
to 0
and the 'Carrier'
column of your df contains strings.
You can try df['Carrier'] = df['Carrier'].astype(int)
and then your code runs without errors:
>>> out = (
... df.mask(df.lt(0), 0)
... .assign(AverageAllDelay= lambda x: x['ArrDelay'].add(x['DepDelay']))
... .groupby('Carrier', as_index=False)['AverageAllDelay'].mean()
... )
>>> out
Carrier AverageAllDelay
0 1 12.0
1 2 31.0
2 3 1.0