Home > OS >  How to use if, then statements in R?
How to use if, then statements in R?

Time:02-15

I am learning R and looking to distinguish if a specific family member in my dataset is under the age of 18 at the time of testing. I want to be able to check the year of birth if a participant meets certain criteria, and my thought was to do an if statement. I'm not sure if this is the best way to go about it.

Example dataframe:

# sample dataframe
data <- data.frame(
  ID = c(1, 2, 3, 4),
  relationship = c(1, 4, 6, 2, 5),
  relatechild = c(2, 8, 9, 4),
  year_dob = c(1994, 2001, 1987, 2005)
)

I want to create a column in a new dataframe that prints the 'year_dob' if 'relationship' is 3:6 and 'relatechild' is 7:9. I was thinking this might be an option but ideally would print as a new column with the birth year, only if it meets the criteria.

if(data$relationship =  3:6 && data$relatechild = 7:9)
    print(data$year_dob) 

CodePudding user response:

data <- data.frame(
  ID = c(1, 2, 3, 4),
  relationship = c(1, 4, 6, 2),
  relatechild = c(2, 8, 9, 4),
  year_dob = c(1994, 2001, 1987, 2005)
)

There was an additional value in relationship in your example data, so I’ve removed the 5.

There’s a few things going on here. First, if only works on one element or value at a time. Your current code will only look at the first row of your data, ie.

data[1,]
#>   ID relationship relatechild year_dob
#> 1  1            1           2     1994

It doesn’t check the whole vector (column). You’ll notice that row doesn’t fulfill the condition, so nothing would happen. If you want to look at each element in a vector, ifelse is more appropriate.

Second, = is used to assign values to objects. You would use == when you want something equal to, but because you have a set of values %in% is what we would use. we want to know if relationship is in 3-6 and relatechild is in 7-9. They can’t be equal to those exact values.

I’m not 100% clear when you say you want a new dataframe. From your code it looks like you want to create a new column in your existing data.frame, but I think you might be trying to subset. In that case, neither if nor ifelse is what we want. Note the c() is technically redundant below, but I’m using it to illustrate that selecting several values you generally need to indicate that it’s a set of values of some sort. It’s implied when you use : for a set of numbers.

df <- data[data$relationship %in% c(3:6) & data$relatechild %in% c(7:9) , ]

df
#>   ID relationship relatechild year_dob
#> 2  2            4           8     2001
#> 3  3            6           9     1987

If you want to create a new column using ifelse:

data$dob <- with(data, ifelse(relationship %in% 3:6 & relatechild %in% 7:9, year_dob, NA))

data
#>   ID relationship relatechild year_dob  dob
#> 1  1            1           2     1994   NA
#> 2  2            4           8     2001 2001
#> 3  3            6           9     1987 1987
#> 4  4            2           4     2005   NA

You could also create an indicator rather than putting the date. I’m not sure what indicates under 18, but I’m assuming later birth years are under 18.

data$under_18 <- with(data, ifelse(relationship %in% 3:6 & relatechild %in% 7:9, 0, 1))

data
#>   ID relationship relatechild year_dob  dob under_18
#> 1  1            1           2     1994   NA        1
#> 2  2            4           8     2001 2001        0
#> 3  3            6           9     1987 1987        0
#> 4  4            2           4     2005   NA        1

CodePudding user response:

If you're testing values against a reference range you are better off using %in% as @akrun mentioned in the comments.

One solution is to just use dplyr::filter() on the criteria you stated separated by & which is a logical and operator to require both conditions to be met in the output. Then just dplyr::pull() the year_dob column and do whatever you like with it.

library(tidyverse)

d <- data.frame(
  ID = c(1, 2, 3, 4),
  relationship = c(1, 4, 6, 2),
  relatechild = c(2, 8, 9, 4),
  year_dob = c(1994, 2001, 1987, 2005)
)

d %>% 
  filter(relationship %in% 3:6 & relatechild %in% 7:9) %>% 
  pull(year_dob)
#> [1] 2001 1987

Created on 2022-02-14 by the reprex package (v2.0.1)

  • Related