Home > OS >  How do I add a conditional indicator variable to a dataframe in R?
How do I add a conditional indicator variable to a dataframe in R?

Time:11-11

The variable I am trying to create is called "high".

It will take a value of "1" if totexp (which is already in the dataframe) is above the median of totexp.

It will take a value of "0" if totexp (which is already in the dataframe) is below the median of totexp.

Below is the code I have used to create the variable.

high = rep(0, nrow(df))

high[totexp > median(df$totexp)] = 1

How do I add it to the dataframe?

Thank you!

CodePudding user response:

You can do an ifelse call:

Data:

df <- data.frame(totexp = c(1,2,3,4,1,2,35,6))

Solution:

df$high <- ifelse(df$totexp > median(df$totexp, na.rm = TRUE),
                  1,
                  NA)

Result:

df
  totexp high
1      1   NA
2      2   NA
3      3    1
4      4    1
5      1   NA
6      2   NA
7     35    1
8      6    1

CodePudding user response:

Here is a reproducible solution:

# load library
library(tidyverse)

# create some data
dat <- tibble(year = seq(2000, 2020, by = 1)
              , totexp = rnorm(n = 21, mean = 1000, sd = 300))

# check the median
median(dat$totexp)

# add our new variable to the dataset
dat$high <- ifelse(dat$totexp > median(dat$totexp), 1, 0)

In R you can add a variable to a dataframe using the <- operator and the dataframe name, e.g. df$high <- your code here.

  •  Tags:  
  • r
  • Related