With a dataset like original:
id <- c("JF", "GH", "GH", "ANN", "GH", "ROG", "JF")
group <- c("most", "least", "most", "least", "least", "most", "least")
NP <- c(4,6,18,1,3,12,8)
iso_USA <- c(1, 0, 0, 0, 0, 1, 1)
iso_CHN <- c(0, 1, 1, 0, 0, 0, 0)
color <- c("blue", "orange", "blue", "blue", "red", "orange", "black")
original <- data.frame(id, group, NP, iso_USA, iso_CHN, color)
numeric <- unlist(lapply(original, is.numeric))
numeric <- names(original[ , numeric])
char <- unlist(lapply(original, is.character))
char <- names(original[ , char])
char <- char[-1] #remove id from variables of interest
I want to group by "group" and calculate the median for the numeric variables and the mode for the character variables. Therefore, the data looks like original2. Note that my actual dataset has way more columns than the mock version presented here:
group <- c("least", "most")
NP <- c(6,12)
iso_USA <- c(0,1)
iso_CHN <- c(0, 0)
color <- c("orange", "blue")
original2 <- data.frame(group, NP, iso_USA, iso_CHN, color)
Any clue?
CodePudding user response:
Using dplyr
's across
functionality and the the accepted answer at the FAQ about implementing a mode
function:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
original %>%
select(-id) %>%
group_by(group) %>%
summarize(
across(where(is.numeric), median),
across(where(is.character), Mode)
)
# # A tibble: 2 × 6
# group NP iso_USA iso_CHN color
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 least 4.5 0 0 orange
# 2 most 12 1 0 blue
CodePudding user response:
Here's a base R solution. Basically I was just curious if its possible to do this relatively easily without additional packages.
For this I need
- a function that can tell me if elements are castable to numeric
and - obviously -
- a Mode function (which I will copy blatantly from @Gregor Thomas's solution).
Functions
isCastableNumeric <- function( str ){ if(is.character(str)){
all(sapply( str, function(x) sapply( strsplit( x, "\\." ), function(y)
grepl( "^-?[0-9] $|^[0-9] $",paste0(y, collapse="") )&length(y)<=2 ) ) ) }
else{ T } }
Mode <- function(x){ ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))] }
Application
aggregate( cbind(NP, iso_USA, iso_CHN, color) ~ group, original, function(x){
'if'( isCastableNumeric(x[1]),median(as.numeric(x)),Mode(as.character(x)) ) } )
group NP iso_USA iso_CHN color
1 least 4.5 0 0 orange
2 most 12.0 1 0 blue