I have the dataframe below:
Date A B
2022-05-01 08:00:00 43 100
2022-05-01 08:01:00 NaN 54
2022-05-01 08:02:00 41 100
I will like to filter the rows if:
- Column
A
is NaN - Column
B
is < 100
I tried this line of code:
dff = df.loc[((df[df['A'].isna()]) & (df['B'] < 100))]
However I get an error:
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
I think the error is pretty clear (I can't compare A with B due to different data type?)
Is there a way to compare the two columns and ideally to add a new column with comments for every filtered rows?
The expected output is:
Date A B Comments
2022-05-01 08:01:00 NaN 54 Damaged item
Thanks.
CodePudding user response:
Problem in your Code Snippet:
Before moving towards a solution let's discuss the problem in your Code Snippet
:
The primary reason for
Code Breaking
is theReturn
type of yourConditions
:df['A'].isna()
will returnbool
values[df['B'] < 100]
will return aData
So, according to the above condition we can't compare
Bool
andFloat
To know more about pd.dataFrame.isna(): Click Here
Solution :
Now, let's deep dive into the Solution
of your given Scenario.
- The easiest way to
solve
such an error is that we have to useConditions
in which thereturn
would be the same for both conditions. So, here I have used thelt
operator to compare values thatreturn bool
based on the condition.
So, for detailed solution refer the Code Snippet
mentioned below:
# Import all the important Modules
import pandas as pd
import numpy as np
# Data Regenration
data_df = pd.DataFrame({
'Date': ['2022-05-01 08:00:00', '2022-05-01 08:01:00', '2022-05-01 08:02:00'],
'A': [43, np.nan, 41],
'B': [100, 54, 100]
})
# Condition for 'Damage Items'
conditions = [
(data_df['A'].isna() & data_df['B'].lt(100))
]
# 'Choice' to return when above 'Condition' is true
choices = ['Damaged item']
# Store Results
data_df['Comments'] = np.select(conditions, choices, default='')
# Output of above 'Code'
Date A B Comments
0 2022-05-01 08:00:00 43.0 100
1 2022-05-01 08:01:00 NaN 54 Damaged item
2 2022-05-01 08:02:00 41.0 100
To know more about np.select(): Click Here