So I'm trying to extract the number of rows from two datasets that are between -20 to -15 and between -15 to -5 and between -5 and 5.
eg. the dfne dataframe
Latitude Longitude Altitude Value O18nd
0 30.0 -30.0 0.0 -3.776897 -3.776897
1 30.0 -29.9 0.0 -3.765203 -3.765203
2 30.0 -29.8 0.0 -3.753469 -3.753469
3 30.0 -29.7 0.0 -3.741694 -3.741694
4 30.0 -29.6 0.0 -3.729879 -3.729879
... ... ... ... ... ...
359995 69.9 59.5 68.0 -8.602635 -8.738635
359996 69.9 59.6 74.0 -8.594053 -8.742053
359997 69.9 59.7 52.0 -8.585532 -8.689532
I would then like to create a database that is structured as follows (example data)
For this I have written the following code, but I do not know how to extract this data range
import pandas as pd
import scipy.stats as scs
from Extract_elevation import *
dfe = dfle #daten alt
dfne = dflne #daten neu
dfn= dfne.loc[dfne.O18nd == '-20', 'O18nd'].count()
dfa = dfle.loc[dfle.O18ad == '-20', 'O18ad'].count()
df = pd.DataFrame({
'old' : dfa,
'new' : dfn,
});
CodePudding user response:
The first strategy i thought of is just to perform filtering using pandas.query
method.
dfn = dfne.query('O18nd >= -20').query('O18nd <= -10') ['O18nd'].count()
In this example, you get count values for the interval [-20 , -10].
You can automatize the procedure by using groupby and apply count operation on these cut range of values:
tmp = dfne.groupby(pd.cut(dfne["O18nd"], np.arange(-20, 10, 5)))['O18nd'].count()
dfa = pd.DataFrame(tmp).rename(columns={'O18nd': 'O18nd_count_a'})
You will get a result dataframe with a row for each value range:
O18nd_count
O18nd
(-20, -15] 2
(-15, -10] 1
...
(5, 10] 2
(the values are just random).
If you want both two dataframes, you can just apply a join based on the O18nd
column (after a reset_index).
df_final = pd.merge(dfe, dfa, on=['O18nd'])