My dataset has 2 IDs respectively from a parent and a child but I don't know which is who. I have however their age This is the table I am working with:
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
By using an if
statement, I want to identify who's who according to their age.
Here is the code I made but it is not working:
install.packages('seqinr')
library(seqinr)
for (i in 1:nrow(data)){
if (data$age2[i]> data$age1[i]){
swap(data$age1[i], data$age2[i])
}
}
The error message:
Error in if (data$age2[i] > data$age1[i]) { :
missing value where TRUE/FALSE needed
I want to put the parents' age in age1 and the child's age in age2. Does someone has a better idea on how to do it?
CodePudding user response:
Welcome to SO!
You can manage it without any for loop, in case you only need to put the highest value in age1
, and the lower value in age2
, comparing by row the two columns:
# I've put age_* to compare results with data, to replace, use age* in df$age*
df$age_1 <- pmax(df$age1, df$age2)
df$age_2 <- pmin(df$age1, df$age2)
With result:
ID1 ID2 sex1 sex2 age1 age2 age_1 age_2
1 8 9 1 2 44 11 44 11
2 17 7 1 1 56 76 76 56
3 1 44 NA NA 16 55 55 16
4 3 13 NA NA NA NA NA NA
5 55 6 2 NA 56 10 56 10
6 4 33 2 NA 45 9 45 9
7 2 66 1 NA 12 45 45 12
8 72 99 NA NA NA NA NA NA
9 12 11 2 2 30 12 30 12
With data:
df <- read.table(text = 'ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12', header = T)
CodePudding user response:
library(tidyverse)
df <- read_table(
"ID1 ID2 sex1 sex2 age1 age2
8 9 1 2 44 11
17 7 1 1 56 76
1 44 NA NA 16 55
3 13 NA NA NA NA
55 6 2 NA 56 10
4 33 2 NA 45 9
2 66 1 NA 12 45
72 99 NA NA NA NA
12 11 2 2 30 12"
)
Method 1:
df %>%
transform(age1 = case_when(age1 > age2 ~ age1,
TRUE ~ age2),
age2 = case_when(age2 > age1 ~ age2,
TRUE ~ age1))
Method 2:
df %>%
transform(age1 = pmax(age1, age2),
age2 = pmin(age1, age2))
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 76 56
3 1 44 NA NA 55 16
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 45 12
8 72 99 NA NA NA NA
9 12 11 2 2 30 12