Home > OS >  pandas data frame transformations for calculating mean & sem & std for ranges of records
pandas data frame transformations for calculating mean & sem & std for ranges of records

Time:10-17

I have this data frame structure:

AU01_r  AU02_r  AU04_r  AU05_r  AU06_r  AU07_r  AU09_r  AU10_r  AU12_r  

AU14_r  AU15_r  AU17_r  AU20_r  AU23_r  AU25_r  AU26_r  AU45_r  Segment
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
1   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
2   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
3   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1

where each 7500 records have the same segment, I mean segments holds the following records ranges: [[1, 7500], [7501, 15000], [15001, 22500], [22501, 30000], [30001, 37500], [37501, 45000], [45001, 52500], [52501, 60000], [60001, 67500], [67501, 75000], [75001, 82626]]

Currently I have a code - using pandas - that calculate the mean for each segment:

df_AU_r_2_mean = df_AU_r_2.groupby(['Segment']).mean()


AU01_r  AU02_r  AU04_r  AU05_r  AU06_r  AU07_r  AU09_r  AU10_r  

AU12_r  AU14_r  AU15_r  AU17_r  AU20_r  AU23_r  AU25_r  AU26_r  AU45_r  WD  CF
Segment                                                                         
1   0.192525    0.156520    0.888929    0.049577    0.092363    0.609992    0.039349    0.385985    0.242643    0.395441    0.456475    0.504961    0.253471    0.074785    0.509816    0.307315    0.093600    1   1
2   0.190215    0.155545    1.027495    0.144367    0.121984    0.872449    0.103985    0.582804    0.311179    0.685669    0.358625    0.605624    0.182963    0.187416    0.530021    0.521449    0.158552    1   0
3   0.187849    0.114435    1.028465    0.110275    0.045937    0.755899    0.088371    0.395693    0.128856    0.376444    0.491379    0.528315    0.245704    0.086708    0.483681    0.442268    0.173515    1   0

But I need to enhance it, in such a way that I'll be able to calculate mean/sem/std for each one of the AU columns for each 1500 records (to divide each segment into smaller parts).

I wondered if it can be done using pandas data frame transformations?

CodePudding user response:

First add a new column as an incremental id. This will be used to create your newer and smaller segments.

df.insert(0, 'id', range(1, 1   len(df)))

After that create a new column that indicates each 1500 rows.

df["new_Segment"] = pd.to_numeric(df.id//1500).shift(fill_value=0).add(1)

Now, you can do the calculations based on new segment column.

df_AU_r_2_mean = df_AU_r_2.groupby(['new_Segment']).mean()

At the end, the dataframe will be:

id    A B C Segment new_Segment
1     x x x    1        1
2     x x x    1        1
..
1500  x x x    1        1
1501  x x x    1        2
..
7500  x x x    1        5
7501  x x x    2        6
..

For creating new columns with calculations:

df["A_mean"] = df["A"].groupby(['new_Segment']).mean()
  • Related