Home > Mobile >  Calculate the ratio between different column, with some restrictions
Calculate the ratio between different column, with some restrictions

Time:05-11

Consider the dataset:

example1 = data.frame("year"=c(1,1,3,4,1,2,3,4,1,2,3,4,5),
                   "household"=c(1,1,1,1,2,2,2,2,2,2,2,2,2),
                   "person"= c(1,1,1,1,1,1,1,1,2,2,2,2,2),
                   "expected income" = c(seq(140,260,10)),
                   "income" = c(seq(110,230,10)))

Just to have an idea person=1 is the father of the family and person=2 is the mother of the family, in the complete dataset there will be also the children, but it doesn't matter right now. I need to calculate the ratio between column(4) "expected income" in year(i) and column(5)"income" in year (i 1). Furthermore the ratio has to be done only when the "person" and "household" is the same. for example it doesn't have to be calculated the ratio between col(4)-row(4) and col(5)-row(5) because they are two man of different household, the same for col(5)-row(8) and col(5)-row(9) because they are two different person within the same household. Instead of the ratio between the "expected income" and the "income" of two different people I need an NA. It has to be done generically since it is just a semplification of a dataset with more than 60000 row.

CodePudding user response:

Is this what you are looking for:

library(dplyr)
example1 %>%
  mutate(ratio = ifelse(person == household, expected.income/income, NA))

Output:

   year household person expected.income income    ratio
1     1         1      1             140    110 1.272727
2     1         1      1             150    120 1.250000
3     3         1      1             160    130 1.230769
4     4         1      1             170    140 1.214286
5     1         2      1             180    150       NA
6     2         2      1             190    160       NA
7     3         2      1             200    170       NA
8     4         2      1             210    180       NA
9     1         2      2             220    190 1.157895
10    2         2      2             230    200 1.150000
11    3         2      2             240    210 1.142857
12    4         2      2             250    220 1.136364
13    5         2      2             260    230 1.130435

CodePudding user response:

It sounds like you need to group by household and person, then find the ratio of the expected income to the lead value of income:

library(tidyverse)

example1 %>%
  group_by(person, household) %>%
  mutate(ratio = expected.income / lead(income))
#> # A tibble: 13 x 6
#> # Groups:   person, household [3]
#>     year household person expected.income income ratio
#>    <dbl>     <dbl>  <dbl>           <dbl>  <dbl> <dbl>
#>  1     1         1      1             140    110  1.17
#>  2     2         1      1             150    120  1.15
#>  3     3         1      1             160    130  1.14
#>  4     4         1      1             170    140 NA   
#>  5     1         2      1             180    150  1.12
#>  6     2         2      1             190    160  1.12
#>  7     3         2      1             200    170  1.11
#>  8     4         2      1             210    180 NA   
#>  9     1         2      2             220    190  1.1 
#> 10     2         2      2             230    200  1.10
#> 11     3         2      2             240    210  1.09
#> 12     4         2      2             250    220  1.09
#> 13     5         2      2             260    230 NA

Created on 2022-05-11 by the reprex package (v2.0.1)

CodePudding user response:

First order by household, person and year. Then calculate the ratio and set all rations to NA where the next lien is not the next year or not the same household or not the same person.

. <- example1
. <- .[order(.$household, .$person, .$year),]
.$ratio <- .$expected.income / c(.$income[-1], NA)
is.na(.$ratio) <- (1   .$year) != c(.$year[-1], NA) |
  .$household != c(.$household[-1], NA) | .$person != c(.$person[-1], NA)
.
#   year household person expected.income income    ratio
#1     1         1      1             140    110       NA
#2     1         1      1             150    120       NA
#3     3         1      1             160    130 1.142857
#4     4         1      1             170    140       NA
#5     1         2      1             180    150 1.125000
#6     2         2      1             190    160 1.117647
#7     3         2      1             200    170 1.111111
#8     4         2      1             210    180       NA
#9     1         2      2             220    190 1.100000
#10    2         2      2             230    200 1.095238
#11    3         2      2             240    210 1.090909
#12    4         2      2             250    220 1.086957
#13    5         2      2             260    230       NA

Don't know if stating two times with year 1 is a typo, but it shows if the condition of next year is considered.

  •  Tags:  
  • r
  • Related