Home > OS >  In R or python, how can I calculate the median of every 10 data points per column, in a data set of
In R or python, how can I calculate the median of every 10 data points per column, in a data set of

Time:04-03

I have a data set that was collected using a sampling rate of 10 Hz (per second) and the data is non-normally distributed, so I wish to calculate the median of every 10 values down each column. Then, from the generated medians, I would like to calculate the median of every 60 down each column (so essentially I have 1 median calculated per second now) - and am totally stumped as to how to do this. I have both python and R studio, the data consists of 16 columns and 397939 entries. Thank you so so much in advance if you can help me!!!!!

Please forgive me for being such a coding rookie. New to this but really keen to learn.

CodePudding user response:

This groups by N rows;

N = 10
df.reset_index(drop=True).groupby(by=lambda x: x/N, axis=0).mean()

You might have to change the axis.

CodePudding user response:

Using R and the tidyverse I would do something like:

library(tidyverse)
df <- tibble(id = 1:50, x = runif(50, 0, 100))
df %>% mutate(block = rep(1:(nrow(df)/10), each = 10)) %>% 
    group_by(block) %>% 
    summarise(median = median(x))

You will need to decide how you are going to deal with the fact that your number of rows is not a multiple of 10. You can repeat the process to get your median per sec or just run the same code with different numbers in the block.

  • Related