Consider the dataset:
example1 = data.frame("year"=c(1,1,3,4,1,2,3,4,1,2,3,4,5),
"household"=c(1,1,1,1,2,2,2,2,2,2,2,2,2),
"person"= c(1,1,1,1,1,1,1,1,2,2,2,2,2),
"expected income" = c(seq(140,260,10)),
"income" = c(seq(110,230,10)))
Just to have an idea person=1 is the father of the family and person=2 is the mother of the family, in the complete dataset there will be also the children, but it doesn't matter right now. I need to calculate the ratio between column(4) "expected income" in year(i) and column(5)"income" in year (i 1). Furthermore the ratio has to be done only when the "person" and "household" is the same. for example it doesn't have to be calculated the ratio between col(4)-row(4) and col(5)-row(5) because they are two man of different household, the same for col(5)-row(8) and col(5)-row(9) because they are two different person within the same household. Instead of the ratio between the "expected income" and the "income" of two different people I need an NA. It has to be done generically since it is just a semplification of a dataset with more than 60000 row.
CodePudding user response:
Is this what you are looking for:
library(dplyr)
example1 %>%
mutate(ratio = ifelse(person == household, expected.income/income, NA))
Output:
year household person expected.income income ratio
1 1 1 1 140 110 1.272727
2 1 1 1 150 120 1.250000
3 3 1 1 160 130 1.230769
4 4 1 1 170 140 1.214286
5 1 2 1 180 150 NA
6 2 2 1 190 160 NA
7 3 2 1 200 170 NA
8 4 2 1 210 180 NA
9 1 2 2 220 190 1.157895
10 2 2 2 230 200 1.150000
11 3 2 2 240 210 1.142857
12 4 2 2 250 220 1.136364
13 5 2 2 260 230 1.130435
CodePudding user response:
It sounds like you need to group by household and person, then find the ratio of the expected income to the lead
value of income
:
library(tidyverse)
example1 %>%
group_by(person, household) %>%
mutate(ratio = expected.income / lead(income))
#> # A tibble: 13 x 6
#> # Groups: person, household [3]
#> year household person expected.income income ratio
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 140 110 1.17
#> 2 2 1 1 150 120 1.15
#> 3 3 1 1 160 130 1.14
#> 4 4 1 1 170 140 NA
#> 5 1 2 1 180 150 1.12
#> 6 2 2 1 190 160 1.12
#> 7 3 2 1 200 170 1.11
#> 8 4 2 1 210 180 NA
#> 9 1 2 2 220 190 1.1
#> 10 2 2 2 230 200 1.10
#> 11 3 2 2 240 210 1.09
#> 12 4 2 2 250 220 1.09
#> 13 5 2 2 260 230 NA
Created on 2022-05-11 by the reprex package (v2.0.1)
CodePudding user response:
First order
by household, person and year. Then calculate the ratio and set all rations to NA
where the next lien is not the next year or not the same household or not the same person.
. <- example1
. <- .[order(.$household, .$person, .$year),]
.$ratio <- .$expected.income / c(.$income[-1], NA)
is.na(.$ratio) <- (1 .$year) != c(.$year[-1], NA) |
.$household != c(.$household[-1], NA) | .$person != c(.$person[-1], NA)
.
# year household person expected.income income ratio
#1 1 1 1 140 110 NA
#2 1 1 1 150 120 NA
#3 3 1 1 160 130 1.142857
#4 4 1 1 170 140 NA
#5 1 2 1 180 150 1.125000
#6 2 2 1 190 160 1.117647
#7 3 2 1 200 170 1.111111
#8 4 2 1 210 180 NA
#9 1 2 2 220 190 1.100000
#10 2 2 2 230 200 1.095238
#11 3 2 2 240 210 1.090909
#12 4 2 2 250 220 1.086957
#13 5 2 2 260 230 NA
Don't know if stating two times with year 1 is a typo, but it shows if the condition of next year is considered.