Home > Software engineering >  R: In a list of tibbles, I need to add a unique number to a column in each tibble
R: In a list of tibbles, I need to add a unique number to a column in each tibble

Time:08-17

I have a list that contains multiple tibbles (named as sequential numbers) and within each tibble are multiple columns. The column colA exists in every tibble but while some tibbles have a number filled down the column, other tibbles only have NA filled down the column.

What I would like to do is write a function where if the value within colA == NA, then it should fill the column with the string "i missing", where i is a number. I need each tibble to have a unique number that is added to the string.

My thought was to utilize the tibble names for this since they are numeric but it can really be any number, as long as it is unique to each tibble.

I was thinking I'd likely need to use some mix of ifelse() and paste() but I'm having a hard time figuring out how to go about it.

    
 `1` <- tibble(colA = c("a123", "a123", "a123", "a123"), colB = c("abc", "def", "ghi", "jkl"))
 `2` <- tibble(colA = c(NA, NA, NA, NA), colB = c("hij", "klm", "nop", "qrs"))
 `3` <- tibble(colA = c(NA, NA, NA, NA), colB = c("abc", "def", "ghi", "jkl"))
 `4` <- tibble(colA = c("e2b4", "e2b4", "e2b4", "e2b4"), colB = c("tuv", "wxy", "zab", "cde"))
 List1 <- list(`1`, `2`, `3`, `4`)
 List1

#> [[1]]
#> # A tibble: 4 x 2
#>   colA  colB 
#>   <chr> <chr>
#> 1 a123  abc  
#> 2 a123  def  
#> 3 a123  ghi  
#> 4 a123  jkl  
#> 
#> [[2]]
#> # A tibble: 4 x 2
#>   colA  colB 
#>   <lgl> <chr>
#> 1 NA    hij  
#> 2 NA    klm  
#> 3 NA    nop  
#> 4 NA    qrs  
#> 
#> [[3]]
#> # A tibble: 4 x 2
#>   colA  colB 
#>   <lgl> <chr>
#> 1 NA    abc  
#> 2 NA    def  
#> 3 NA    ghi  
#> 4 NA    jkl  
#> 
#> [[4]]
#> # A tibble: 4 x 2
#>   colA  colB 
#>   <chr> <chr>
#> 1 e2b4  tuv  
#> 2 e2b4  wxy  
#> 3 e2b4  zab  
#> 4 e2b4  cde ```

CodePudding user response:

Something like this:

We first add an id column to the list. Then using group_split again create a list and apply first map with @akruns formula using id and not row_number().

Finally use the second map to remove id column:

library(dplyr)
library(purrr)

bind_rows(List1, .id = 'id') %>%
  group_split(id) %>% 
  map(., ~ .x %>% mutate(colA = case_when(is.na(colA) ~ str_c(id, ' missing'), 
                                          TRUE ~ as.character(colA)))) %>% 
  map(.,~(.x %>%select(-id)))
[[1]]
# A tibble: 4 x 2
  colA  colB 
  <chr> <chr>
1 a123  abc  
2 a123  def  
3 a123  ghi  
4 a123  jkl  

[[2]]
# A tibble: 4 x 2
  colA      colB 
  <chr>     <chr>
1 2 missing hij  
2 2 missing klm  
3 2 missing nop  
4 2 missing qrs  

[[3]]
# A tibble: 4 x 2
  colA      colB 
  <chr>     <chr>
1 3 missing abc  
2 3 missing def  
3 3 missing ghi  
4 3 missing jkl  

[[4]]
# A tibble: 4 x 2
  colA  colB 
  <chr> <chr>
1 e2b4  tuv  
2 e2b4  wxy  
3 e2b4  zab  
4 e2b4  cde  

CodePudding user response:

One possible way to solve your problem using the built-in Map function

Map(\(d, i) if(all(is.na(d$colA))) {d$colA = paste0(i, " missing"); d} else d, 
    List1, 
    seq_along(List1))
  • Related