I have a data frame on population of particles with given size. Data is organized in a dataframe where the first column represents the size (x value) and the other columns represent the density (y-values) for the actual size. I need to calculate the median for all the columns.
Since median()
works with hist data, I decided to transform my dataset to this type by adding Nth time the value of the first column to a vector and get N from all the columns for the rows. This actually works, but really slow with my 1200 lines dataframes, so I wonder if you have a more efficient solution.
df <- data.frame(Size = c(1:100),
val1 = sample(0:9,100,replace = TRUE,),
val2 = sample(0:9,100,replace = TRUE))
get.median <- function(dataset){
results <- list()
for(col in colnames(dataset)[2:ncol(dataset)]){
col.results <- c()
for(i in 1:nrow(dataset)){
size <- dataset[i,"Size"]
count <- dataset[i,col]
out <- rep(size,count)
col.results <- c(col.results,out)
}
med <- median(col.results)
results <- append(results,med)
}
return(results)
}
get.median(df)
CodePudding user response:
Without transforming:
lapply(df[,2:3], function(y) median(rep(df$Size, times = y)))
$val1
[1] 49
$val2
[1] 47
data:
set.seed(99)
df <- data.frame(Size = c(1:100),
val1 = sample(0:9,100,replace = TRUE,),
val2 = sample(0:9,100,replace = TRUE))
CodePudding user response:
You can use sapply
and median
to calculate the median
for each column like this:
sapply(df, median)
Output:
Size val1 val2
50.5 6.0 3.5
CodePudding user response:
from "spatstat" library with dplyr::across
> df %>% summarize(across(-Size, ~weighted.median(Size,.x,na.rm = TRUE)))
val1 val2
1 42.5 47.5