Extract Rows from different df`s-CodePudding

So I'm trying to extract the number of rows from two datasets that are between -20 to -15 and between -15 to -5 and between -5 and 5.

eg. the dfne dataframe

        Latitude  Longitude  Altitude     Value     O18nd
0           30.0      -30.0       0.0 -3.776897 -3.776897
1           30.0      -29.9       0.0 -3.765203 -3.765203
2           30.0      -29.8       0.0 -3.753469 -3.753469
3           30.0      -29.7       0.0 -3.741694 -3.741694
4           30.0      -29.6       0.0 -3.729879 -3.729879
...          ...        ...       ...       ...       ...
359995      69.9       59.5      68.0 -8.602635 -8.738635
359996      69.9       59.6      74.0 -8.594053 -8.742053
359997      69.9       59.7      52.0 -8.585532 -8.689532

I would then like to create a database that is structured as follows (example data)

For this I have written the following code, but I do not know how to extract this data range

import pandas as pd
import scipy.stats as scs
from Extract_elevation import *

dfe = dfle  #daten alt
dfne = dflne    #daten neu 


dfn= dfne.loc[dfne.O18nd == '-20', 'O18nd'].count()
dfa = dfle.loc[dfle.O18ad == '-20', 'O18ad'].count()

df = pd.DataFrame({
   'old' : dfa,
   'new' : dfn,
});

CodePudding user response：

The first strategy i thought of is just to perform filtering using pandas.query method.

dfn = dfne.query('O18nd >= -20').query('O18nd <= -10') ['O18nd'].count()

In this example, you get count values for the interval [-20 , -10].

You can automatize the procedure by using groupby and apply count operation on these cut range of values:

tmp = dfne.groupby(pd.cut(dfne["O18nd"], np.arange(-20, 10, 5)))['O18nd'].count()
dfa = pd.DataFrame(tmp).rename(columns={'O18nd': 'O18nd_count_a'})

You will get a result dataframe with a row for each value range:

            O18nd_count
O18nd                   
(-20, -15]  2
(-15, -10]  1
...
(5, 10]    2

(the values are just random).

If you want both two dataframes, you can just apply a join based on the O18nd column (after a reset_index).

df_final = pd.merge(dfe, dfa, on=['O18nd'])