I have this data frame structure:
AU01_r AU02_r AU04_r AU05_r AU06_r AU07_r AU09_r AU10_r AU12_r
AU14_r AU15_r AU17_r AU20_r AU23_r AU25_r AU26_r AU45_r Segment
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
where each 7500 records have the same segment, I mean segments holds the following records ranges: [[1, 7500], [7501, 15000], [15001, 22500], [22501, 30000], [30001, 37500], [37501, 45000], [45001, 52500], [52501, 60000], [60001, 67500], [67501, 75000], [75001, 82626]]
Currently I have a code - using pandas - that calculate the mean for each segment:
df_AU_r_2_mean = df_AU_r_2.groupby(['Segment']).mean()
AU01_r AU02_r AU04_r AU05_r AU06_r AU07_r AU09_r AU10_r
AU12_r AU14_r AU15_r AU17_r AU20_r AU23_r AU25_r AU26_r AU45_r WD CF
Segment
1 0.192525 0.156520 0.888929 0.049577 0.092363 0.609992 0.039349 0.385985 0.242643 0.395441 0.456475 0.504961 0.253471 0.074785 0.509816 0.307315 0.093600 1 1
2 0.190215 0.155545 1.027495 0.144367 0.121984 0.872449 0.103985 0.582804 0.311179 0.685669 0.358625 0.605624 0.182963 0.187416 0.530021 0.521449 0.158552 1 0
3 0.187849 0.114435 1.028465 0.110275 0.045937 0.755899 0.088371 0.395693 0.128856 0.376444 0.491379 0.528315 0.245704 0.086708 0.483681 0.442268 0.173515 1 0
But I need to enhance it, in such a way that I'll be able to calculate mean/sem/std for each one of the AU columns for each 1500 records (to divide each segment into smaller parts).
I wondered if it can be done using pandas data frame transformations?
CodePudding user response:
First add a new column as an incremental id. This will be used to create your newer and smaller segments.
df.insert(0, 'id', range(1, 1 len(df)))
After that create a new column that indicates each 1500 rows.
df["new_Segment"] = pd.to_numeric(df.id//1500).shift(fill_value=0).add(1)
Now, you can do the calculations based on new segment column.
df_AU_r_2_mean = df_AU_r_2.groupby(['new_Segment']).mean()
At the end, the dataframe will be:
id A B C Segment new_Segment
1 x x x 1 1
2 x x x 1 1
..
1500 x x x 1 1
1501 x x x 1 2
..
7500 x x x 1 5
7501 x x x 2 6
..
For creating new columns with calculations:
df["A_mean"] = df["A"].groupby(['new_Segment']).mean()