I am using Panda dataframe; I have a distribution of particles, their distance from the center of the distribution, and the associated fluxes. I want to find the total flux enclosed in the "half flux radius" (or "half light radius"), which is the radius that encloses half of the flux, by definition. I make you an example and then I ask you If you have any idea of how to make it.
Here I list 2 distribution of particles, identified by dist_ID, the distance of each particle from the center of the distribution R, and the flux of each particle.
dist_ID R flux
0 702641.0 5.791781 0.097505
1 702641.0 2.806051 0.015750
2 702641.0 3.254907 0.086941
3 702641.0 8.291544 0.081764
4 702641.0 4.901959 0.053561
5 702641.0 8.630691 0.144661
...
228 802663.0 95.685763 0.025735
229 802663.0 116.070396 0.026012
230 802663.0 112.806001 0.022163
231 802663.0 229.388117 0.026154
For example, considering the particle distribution with dist_ID=702641.0
, the total flux of the particle distribution is the sum of "flux": total_flux=0.48
;
the half flux is half_flux=total_flux/2.=0.24
;
the radius that encloses half of the flux is R_2<R_hf<R_3
(where R_2=3.25
of particle 2; R_3=8.29
of particle 3), so I would consider R_h
as the upper limit of that interval, i.e. R_hf=R_3
.
I want a way, grouping by dist_ID
with Panda dataframe, half_flux
and R_hf
of each distribution. Thanks
CodePudding user response:
If you want the half flux it can be done by
df.groupby("dist_ID").apply(lambda x: x.flux.sum()/2)
Output
dist_ID
702641.0 16.838466
802663.0 276.975139
dtype: float64
Not sure how you want to compute the radius but hopefully this will help you figure it out.
CodePudding user response:
Is R_hf simply the maximum value of R within each group?
One way to do it is this:
# Create Data
data = {'dist_ID': [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.09],
'R': [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
'flux': [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)
# Create variables
grouped_df = df.groupby('dist_ID').agg({'flux': 'sum', 'R': 'max'}).rename(columns={'flux': 'total_flux', 'R': 'R_hf'})
grouped_df['half_flux'] = grouped_df.apply(lambda x: x.total_flux / 2, axis = 1)
Output:
total_flux R_hf half_flux
dist_ID
702641.0 0.480182 8.630691 0.240091
802663.0 0.100064 229.388117 0.050032