Home > OS >  Extract cells where columns contain NA values and create a new column based on the result
Extract cells where columns contain NA values and create a new column based on the result

Time:05-17

I have data which looks like:

     grp REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3`
   <int> <chr>      <chr>             <dbl>        <dbl>        <dbl>
 1     1 ANDALUCÍA  ANDALUCÍA         32143        37962        32374
 2     1 ANDALUCÍA  Almería              NA           NA           NA
 3     1 ANDALUCÍA  Abla                 58           61           54
 4     1 ANDALUCÍA  Abrucena              6            2            1
 5     1 ANDALUCÍA  Adra                146          211          101
 6     1 ANDALUCÍA  Albánchez            12            3            3
 7     1 ANDALUCÍA  Alboloduy             2            2            2
 8     1 ANDALUCÍA  Albox                33           66           35
 9     1 ANDALUCÍA  Alcolea               0            1            1
10     1 ANDALUCÍA  Alcóntar              1            1            2

In this sample, it contains 2 NA rows, one for Almeria and the other for Balanegra.

I want to create a new column RegionName lets say. Where it will be populated by the these two cells. i.e. the expected output would be:

     grp REGIONNAME RegionName    RegionName
   <int> <chr>      <chr>            <chr>
 1     1 ANDALUCÍA  ANDALUCÍA        ANDALUCIA/NA
 2     1 ANDALUCÍA  Almería            Almeria
 3     1 ANDALUCÍA  Abla               Almeria
 4     1 ANDALUCÍA  Abrucena           Almeria
 5     1 ANDALUCÍA  Adra               Almeria
 6     1 ANDALUCÍA  Albánchez          Almeria
 7     1 ANDALUCÍA  Alboloduy          Almeria
 8     1 ANDALUCÍA  Albox                ...
 9     1 ANDALUCÍA  Alcolea              ...
10     1 ANDALUCÍA  Alcóntar             ...
               ...............
  
 1     1 ANDALUCÍA  Bacares              ...
 2     1 ANDALUCÍA  Balanegra          Balanegra
 3     1 ANDALUCÍA  Bayárcal           Balanegra
 4     1 ANDALUCÍA  Bayarque           Balanegra
 5     1 ANDALUCÍA  Bédar              Balanegra
 6     1 ANDALUCÍA  Beires    
 7     1 ANDALUCÍA  Benahadux           ....
 8     1 ANDALUCÍA  Benitagla           ....
 9     1 ANDALUCÍA  Benizalón 
10     1 ANDALUCÍA  Bentarique         Balanegra

So when it sees the NA value in the 3 columns, it indicates a new "region".

Finally, I wanted to group_by this newly created region and compute a cumsum in order to fill in the NA values.

I did something "similar" to the REGIONNAME column when I wanted to fill in the NA values for ANDALUCIA.

... %>%
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName)) %>% 
  relocate(REGIONNAME, .before = RegionName) %>% 
  mutate(across(starts_with("Año"), 
                ~ ifelse(REGIONNAME == RegionName, sum(.x[REGIONNAME != RegionName], na.rm = T), .x)))

Data:

df = structure(list(grp = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), REGIONNAME = c("ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", "ANDALUCÍA", 
"ANDALUCÍA", "ANDALUCÍA"), RegionName = c("ANDALUCÍA", "Almería", 
"Abla", "Abrucena", "Adra", "Albánchez", "Alboloduy", "Albox", 
"Alcolea", "Alcóntar", "Alcudia de Monteagud", "Alhabia", "Alhama de Almería", 
"Alicún", "Almería", "Almócita", "Alsodux", "Antas", "Arboleas", 
"Armuña de Almanzora", "Bacares", "Balanegra", "Bayárcal", 
"Bayarque", "Bédar", "Beires", "Benahadux", "Benitagla", "Benizalón", 
"Bentarique"), `Año 2004_1` = c(32143, NA, 58, 6, 146, 12, 2, 
33, 0, 1, 1, 1, 13, 0, 748, 0, 1, 6, 16, 0, 2, NA, 0, 0, 8, 0, 
18, 1, 2, 0), `Año 2004_2` = c(37962, NA, 61, 2, 211, 3, 2, 
66, 1, 1, 1, 0, 15, 1, 770, 0, 10, 12, 16, 0, 1, NA, 1, 0, 2, 
0, 21, 0, 0, 0), `Año 2004_3` = c(32374, NA, 54, 1, 101, 3, 
2, 35, 1, 2, 0, 0, 14, 0, 701, 0, 3, 26, 14, 0, 0, NA, 0, 3, 
8, 0, 25, 0, 2, 0)), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -30L), groups = structure(list(
    grp = 1L, .rows = structure(list(1:30), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

CodePudding user response:

You can use c_across and fill:

library(tidyverse)

df %>% 
  rowwise() %>% 
  mutate(Region = case_when(all(is.na(c_across(starts_with("Año")))) ~ RegionName)) %>% 
  ungroup() %>% 
  fill(Region)

# A tibble: 30 × 7
     grp REGIONNAME RegionName           `Año 2004_1` `Año 2004_2` `Año 2004_3` Region   
   <int> <chr>      <chr>                       <dbl>        <dbl>        <dbl> <chr>    
 1     1 ANDALUCÍA  ANDALUCÍA                   32143        37962        32374 NA       
 2     1 ANDALUCÍA  Almería                        NA           NA           NA Almería  
 3     1 ANDALUCÍA  Abla                           58           61           54 Almería  
 4     1 ANDALUCÍA  Abrucena                        6            2            1 Almería  
 5     1 ANDALUCÍA  Adra                          146          211          101 Almería  
 6     1 ANDALUCÍA  Albánchez                      12            3            3 Almería  
 7     1 ANDALUCÍA  Alboloduy                       2            2            2 Almería  
 8     1 ANDALUCÍA  Albox                          33           66           35 Almería  
 9     1 ANDALUCÍA  Alcolea                         0            1            1 Almería  
10     1 ANDALUCÍA  Alcóntar                        1            1            2 Almería  
11     1 ANDALUCÍA  Alcudia de Monteagud            1            1            0 Almería  
12     1 ANDALUCÍA  Alhabia                         1            0            0 Almería  
13     1 ANDALUCÍA  Alhama de Almería              13           15           14 Almería  
14     1 ANDALUCÍA  Alicún                          0            1            0 Almería  
15     1 ANDALUCÍA  Almería                       748          770          701 Almería  
16     1 ANDALUCÍA  Almócita                        0            0            0 Almería  
17     1 ANDALUCÍA  Alsodux                         1           10            3 Almería  
18     1 ANDALUCÍA  Antas                           6           12           26 Almería  
19     1 ANDALUCÍA  Arboleas                       16           16           14 Almería  
20     1 ANDALUCÍA  Armuña de Almanzora             0            0            0 Almería  
21     1 ANDALUCÍA  Bacares                         2            1            0 Almería  
22     1 ANDALUCÍA  Balanegra                      NA           NA           NA Balanegra
23     1 ANDALUCÍA  Bayárcal                        0            1            0 Balanegra
24     1 ANDALUCÍA  Bayarque                        0            0            3 Balanegra
25     1 ANDALUCÍA  Bédar                           8            2            8 Balanegra
26     1 ANDALUCÍA  Beires                          0            0            0 Balanegra
27     1 ANDALUCÍA  Benahadux                      18           21           25 Balanegra
28     1 ANDALUCÍA  Benitagla                       1            0            0 Balanegra
29     1 ANDALUCÍA  Benizalón                       2            0            2 Balanegra
30     1 ANDALUCÍA  Bentarique                      0            0            0 Balanegra
  • Related