Home > Software design >  How to create a variable that counts years since a string (incident)?
How to create a variable that counts years since a string (incident)?

Time:09-19

I have a data frame containing information for each country-year on natural disasters. The structure is similar to the following:

test <- data.frame(year = rep(c(1990:1995) , times = 3) ,
                   country = rep(c("aaa" , "bbb" , "ccc") , each = 6) ,
                   incident = c(NA , "Drought" , rep(NA , times = 7) , "Flood" , "Flood" , NA ,
                                rep(NA , times = 6))  ) 

Giving:

   year country incident
1  1990     aaa     <NA>
2  1991     aaa  Drought
3  1992     aaa     <NA>
4  1993     aaa     <NA>
5  1994     aaa     <NA>
6  1995     aaa     <NA>
7  1990     bbb     <NA>
8  1991     bbb     <NA>
9  1992     bbb     <NA>
10 1993     bbb    Flood
11 1994     bbb    Flood
12 1995     bbb     <NA>
13 1990     ccc     <NA>
14 1991     ccc     <NA>
15 1992     ccc     <NA>
16 1993     ccc     <NA>
17 1994     ccc     <NA>
18 1995     ccc     <NA>

I would like to introduce a new variable into this data frame that counts the number of years for each country since the last incident, and replace each string encountered with 0. Something like this:

   year country incident years_since
1  1990     aaa     <NA>          NA
2  1991     aaa  Drought           0
3  1992     aaa     <NA>           1
4  1993     aaa     <NA>           2
5  1994     aaa     <NA>           3
6  1995     aaa     <NA>           4
7  1990     bbb     <NA>          NA
8  1991     bbb     <NA>          NA
9  1992     bbb     <NA>          NA
10 1993     bbb    Flood           0
11 1994     bbb    Flood           0
12 1995     bbb     <NA>           1
13 1990     ccc     <NA>          NA
14 1991     ccc     <NA>          NA
15 1992     ccc     <NA>          NA
16 1993     ccc     <NA>          NA
17 1994     ccc     <NA>          NA
18 1995     ccc     <NA>          NA

Is there a way to do this efficiently using one line of code?

CodePudding user response:

1) Create a years_since function based on counting the number of positions since the indicated cumsum and apply it by country.

library(dplyr)

years_since <- function(x) {
  cs <- cumsum(!is.na(x))
  ifelse(cs, seq_along(cs) - match(cs, cs), NA)
}

test %>%
  group_by(country) %>%
  mutate(years_since = years_since(incident)) %>%
  ungroup

giving:

# A tibble: 18 × 4
    year country incident years_since
   <int> <chr>   <chr>          <int>
 1  1990 aaa     <NA>              NA
 2  1991 aaa     Drought            0
 3  1992 aaa     <NA>               1
 4  1993 aaa     <NA>               2
 5  1994 aaa     <NA>               3
 6  1995 aaa     <NA>               4
 7  1990 bbb     <NA>              NA
 8  1991 bbb     <NA>              NA
 9  1992 bbb     <NA>              NA
10  1993 bbb     Flood              0
11  1994 bbb     Flood              0
12  1995 bbb     <NA>               1
13  1990 ccc     <NA>              NA
14  1991 ccc     <NA>              NA
15  1992 ccc     <NA>              NA
16  1993 ccc     <NA>              NA
17  1994 ccc     <NA>              NA
18  1995 ccc     <NA>              NA

2) Alternately using only base R:

transform(test, years_since = ave(incident, country, FUN = years_since))

CodePudding user response:

There might be better solutions, but here's one with cumsum cumany

library(dplyr)
test %>% 
  group_by(country) %>% 
  mutate(years_since = ifelse(cumany(!is.na(incident)), 
                              cumsum(cumsum(lag(!is.na(incident), default = 0) & is.na(incident))), 
                              NA))

output

# A tibble: 18 × 4
# Groups:   country [3]
    year country incident years_since
   <int> <chr>   <chr>          <int>
 1  1990 aaa     NA                NA
 2  1991 aaa     Drought            0
 3  1992 aaa     NA                 1
 4  1993 aaa     NA                 2
 5  1994 aaa     NA                 3
 6  1995 aaa     NA                 4
 7  1990 bbb     NA                NA
 8  1991 bbb     NA                NA
 9  1992 bbb     NA                NA
10  1993 bbb     Flood              0
11  1994 bbb     Flood              0
12  1995 bbb     NA                 1
13  1990 ccc     NA                NA
14  1991 ccc     NA                NA
15  1992 ccc     NA                NA
16  1993 ccc     NA                NA
17  1994 ccc     NA                NA
18  1995 ccc     NA                NA
  • Related