Median Split of one variable to create another variable-CodePudding

I am currently struggling with a median split in R studio. I wish to create a new column in my data frame which is a median split of another, however, I do not know how this can be accomplished. Any and all help will be appreciated. this is the code I have previously run:

medianpcr <- median(honourswork$PCR.x)
highmedian <- filter(honourswork, PCR.x <= medianpcr)
lowmedian <- filter(honourswork, PCR.x > medianpcr)

CodePudding user response：

Let's first create some data:

set.seed(123)
honourswork <- data.frame(PCR.x = rnorm(100))

In dplyr, you might do:

library(tidyverse)
honourswork %>% mutate(medianpcr = median(PCR.x)) %>% filter(PCR.x > medianpcr) %>% select(PCR.x) -> highmedian
        
honourswork %>% mutate(medianpcr = median(PCR.x)) %>% filter(PCR.x <= medianpcr) %>% select(PCR.x) -> lowmedian

Equivalently in base R:

honourswork[honourswork$PCR.x > median(honourswork$PCR.x),] -> highmedian
honourswork[honourswork$PCR.x <= median(honourswork$PCR.x),] -> lowmedian

CodePudding user response：

When you post a question on SO, it's always a good idea to include an example dataframe so that the answerer doesn't have to create one themselves.

Onto your question, if I understand you correctly, you can use the mutate() and case_when() from the dplyr package:


# Load the dplyr library
library(dplyr)

# Create an example dataframe
data <- data.frame(
  rowID = c(1:20),
  value = runif(20, 0, 50)
)

# Use case_when to mutate a new column 'category' with values based on 
# the 'value' column
data2 <- data %>%
  dplyr::mutate(category = 
    dplyr::case_when(
      value > median(value) ~ "Highmedian",
      value < median(value) ~ "Lowmedian",
      value == median(value) ~ "Median"
    )
  )

Output:

More about case_when() here.

Hope this helps!