In R, I want to create a "category" variable that depends on differences between the date of birth and the initial date of individuals born within a group (min) (which is in another dataframe), so that an individual will be in a category 1 if was born between the first 10 days within his category, in category 2 if he was born in the second 10 days, etc. and I want to generate categories until all the possible dates are filled.
I hope my example clarifies it. I made a loop that is running but has some disadvantages: 1) It requires to merge the two dataframes, and 2) It requires to specify one by one the categories.
group = c("A","B","C","D","E","F","G","H","I","J")
min = c(100,125,120,98,89,110,115,130,100,90)
max = c(140,185,220,200,145,150,145,170,170,140)
DATES = data.frame(group,min,max)
ind = c(1:20)
group = c("B","B","B","A","C","D","D","E","J","I","H","A","F","I","E","F","E","C","C","A")
birth_date = c(130,180,150,110,130,200,100,140,130,100,155,130,110,120,135,115,100,150,200,105)
BIRTH = data.frame(ind,group,birth_date)
BIRTH = merge(BIRTH, DATES, by = "group")
BIRTH$category = NA
for (i in 1:nrow(BIRTH)){
if(isTRUE(BIRTH$birth_date[i] <= BIRTH$min[i] 10)==TRUE){
BIRTH$category[i] = 1
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10 & BIRTH$birth_date[i] < BIRTH$min[i] 10*2)==TRUE) {
BIRTH$category[i] = 2
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*2 & BIRTH$birth_date[i] < BIRTH$min[i] 10*3)==TRUE) {
BIRTH$category[i] = 3
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*3 & BIRTH$birth_date[i] < BIRTH$min[i] 10*4)==TRUE) {
BIRTH$category[i] = 4
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*4 & BIRTH$birth_date[i] < BIRTH$min[i] 10*5)==TRUE) {
BIRTH$category[i] = 5
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*5 & BIRTH$birth_date[i] < BIRTH$min[i] 10*6)==TRUE) {
BIRTH$category[i] = 6
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*6 & BIRTH$birth_date[i] < BIRTH$min[i] 10*7)==TRUE) {
BIRTH$category[i] = 7
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*7 & BIRTH$birth_date[i] < BIRTH$min[i] 10*8)==TRUE) {
BIRTH$category[i] = 8
} else if (isTRUE(BIRTH$birth_date[i] >= BIRTH$min[i] 10*8 & BIRTH$birth_date[i] < BIRTH$min[i] 10*9)==TRUE) {
BIRTH$category[i] = 9
} else {
BIRTH$category[i] = 10
}
}
Any suggestions to make it more elegant/efficient?
CodePudding user response:
You can just check how often the difference can be divied by 10 using the %/%
operator. For example
transform(BIRTH, category=(birth_date-min) %/% 10 1)
# group ind birth_date min max category
# 1 A 4 110 100 140 2
# 2 A 12 130 100 140 4
# 3 A 20 105 100 140 1
# 4 B 1 130 125 185 1
# 5 B 2 180 125 185 6
# 6 B 3 150 125 185 3
# 7 C 5 130 120 220 2
# ...
Note that a few of these are different than your desired output because you seem to have a different of 10 in both category 1 and 2. It seems all your other categories are open on the right other than the first which causes the confusion. The first 10 days would be 0,1,2,3,4,5,6,7,8,9 so if you have a difference of 10, then that would be in the second set of 10 days.