Panda dataframe of distribution of particles: group by ID and find the half flux and the half flux r-CodePudding

I am using Panda dataframe; I have a distribution of particles, their distance from the center of the distribution, and the associated fluxes. I want to find the total flux enclosed in the "half flux radius" (or "half light radius"), which is the radius that encloses half of the flux, by definition. I make you an example and then I ask you If you have any idea of how to make it.

Here I list 2 distribution of particles, identified by dist_ID, the distance of each particle from the center of the distribution R, and the flux of each particle.

     dist_ID          R        flux
0    702641.0    5.791781  0.097505
1    702641.0    2.806051  0.015750
2    702641.0    3.254907  0.086941
3    702641.0    8.291544  0.081764
4    702641.0    4.901959  0.053561
5    702641.0    8.630691  0.144661
...
228  802663.0   95.685763  0.025735
229  802663.0  116.070396  0.026012
230  802663.0  112.806001  0.022163
231  802663.0  229.388117  0.026154

For example, considering the particle distribution with dist_ID=702641.0, the total flux of the particle distribution is the sum of "flux": total_flux=0.48; the half flux is half_flux=total_flux/2.=0.24; the radius that encloses half of the flux is R_2<R_hf<R_3 (where R_2=3.25 of particle 2; R_3=8.29 of particle 3), so I would consider R_h as the upper limit of that interval, i.e. R_hf=R_3.

I want a way, grouping by dist_ID with Panda dataframe, half_flux and R_hf of each distribution. Thanks

CodePudding user response：

If you want the half flux it can be done by

df.groupby("dist_ID").apply(lambda x: x.flux.sum()/2)

Output

dist_ID
702641.0     16.838466
802663.0    276.975139
dtype: float64

Not sure how you want to compute the radius but hopefully this will help you figure it out.

CodePudding user response：

Is R_hf simply the maximum value of R within each group?

One way to do it is this:

# Create Data
data = {'dist_ID':  [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.09],
        'R':        [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
        'flux':     [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)

# Create variables    
grouped_df = df.groupby('dist_ID').agg({'flux': 'sum', 'R': 'max'}).rename(columns={'flux': 'total_flux', 'R': 'R_hf'})
grouped_df['half_flux'] = grouped_df.apply(lambda x: x.total_flux / 2, axis = 1)

Output:

          total_flux        R_hf  half_flux
dist_ID
702641.0    0.480182    8.630691   0.240091
802663.0    0.100064  229.388117   0.050032