Home > OS >  How to extract a database based on a condition in pandas?
How to extract a database based on a condition in pandas?

Time:10-15

Please help me

The below one is the problem...

write an expression to extract a new dataframe containing those days where the temperature reached at least 70 degrees, and assign that to the variable at_least_70. (You might need to think some about what the different columns in the full dataframe represent to decide how to extract the subset of interest.) After that, write another expression that computes how many days reached at least 70 degrees, and assign that to the variable num_at_least_70.

This is the original DataFrame

        Date  Maximum Temperature  Minimum Temperature  \
0    2018-01-01                    5                    0   
1    2018-01-02                   13                    1   
2    2018-01-03                   19                   -2   
3    2018-01-04                   22                    1   
4    2018-01-05                   18                   -2   
..          ...                  ...                  ...   
360  2018-12-27                   33                   23   
361  2018-12-28                   40                   21   
362  2018-12-29                   50                   37   
363  2018-12-30                   37                   24   
364  2018-12-31                   35                   25   

     Average Temperature  Precipitation  Snowfall  Snow Depth  
0                    2.5           0.04       1.0         3.0  
1                    7.0           0.03       0.6         4.0  
2                    8.5           0.00       0.0         4.0  
3                   11.5           0.00       0.0         3.0  
4                    8.0           0.09       1.2         4.0  
..                   ...            ...       ...         ...  
360                 28.0           0.00       0.0         1.0  
361                 30.5           0.07       0.0         0.0  
362                 43.5           0.04       0.0         0.0  
363                 30.5           0.02       0.7         1.0  
364                 30.0           0.00       0.0         0.0  

[365 rows x 7 columns]

I wrote the code for the above problem is`

at_least_70 = dfc.loc[dfc['Minimum Temperature']>=70,['Date']]
print(at_least_70)

num_at_least_70 = at_least_70.count()
print(num_at_least_70)

The Results it is showing

      Date
204  2018-07-24
240  2018-08-29
245  2018-09-03
Date    3
dtype: int64

But when run the test case it is showing... Incorrect! You are not correctly extracting the subset.

CodePudding user response:

As suggested by @HenryYik, remove the column selector:

at_least_70 = dfc.loc[dfc['Minimum Temperature'] >= 70]
num_at_least_70 = len(at_least_70)

CodePudding user response:

Use boolean indexing and for count Trues of mask use sum:

mask = dfc['Minimum Temperature'] >= 70

at_least_70 = dfs[mask]
num_at_least_70 = mask.sum()
  • Related