I have a data set that was collected using a sampling rate of 10 Hz (per second) and the data is non-normally distributed, so I wish to calculate the median of every 10 values down each column. Then, from the generated medians, I would like to calculate the median of every 60 down each column (so essentially I have 1 median calculated per second now) - and am totally stumped as to how to do this. I have both python and R studio, the data consists of 16 columns and 397939 entries. Thank you so so much in advance if you can help me!!!!!
Please forgive me for being such a coding rookie. New to this but really keen to learn.
CodePudding user response:
This groups by N rows;
N = 10
df.reset_index(drop=True).groupby(by=lambda x: x/N, axis=0).mean()
You might have to change the axis.
CodePudding user response:
Using R and the tidyverse I would do something like:
library(tidyverse)
df <- tibble(id = 1:50, x = runif(50, 0, 100))
df %>% mutate(block = rep(1:(nrow(df)/10), each = 10)) %>%
group_by(block) %>%
summarise(median = median(x))
You will need to decide how you are going to deal with the fact that your number of rows is not a multiple of 10. You can repeat the process to get your median per sec or just run the same code with different numbers in the block.