Home > Software engineering >  Fill NA with a series of characters in R dplyr
Fill NA with a series of characters in R dplyr

Time:04-13

I have a large data frame that looks like this. Each player is assigned to a group.

library(tidyverse)

df <- tibble(player=c(1,2,3,4,5),groups=c("group1","group2","group2",NA,NA))
df
#> # A tibble: 5 × 2
#>   player groups
#>    <dbl> <chr> 
#> 1      1 group1
#> 2      2 group2
#> 3      3 group2
#> 4      4 <NA>  
#> 5      5 <NA>

Created on 2022-04-12 by the reprex package (v2.0.1) Some players are not assigned into groups and I want to fill them serially -i.e. like this-

#> # A tibble: 5 × 2
#>   player groups
#>    <dbl> <chr> 
#> 1      1 group1
#> 2      2 group2
#> 3      3 group2
#> 4      4 group3
#> 5      5 group4

CodePudding user response:

dplyr

library(dplyr)
df %>%
  mutate(
    maxgrp = max(as.integer(gsub("[^0-9]", "", groups)), na.rm = TRUE),
    groups = if_else(is.na(groups), paste0("group", maxgrp   cumsum(is.na(groups))), groups)
  ) %>%
  select(-maxgrp)
# # A tibble: 5 x 2
#   player groups
#    <dbl> <chr> 
# 1      1 group1
# 2      2 group2
# 3      3 group2
# 4      4 group3
# 5      5 group4

data.table

library(data.table)
DT <- as.data.table(df)
DT[, groups := fifelse(
  is.na(groups),
  paste0("group", cumsum(is.na(groups))   max(as.integer(gsub("[^0-9]", "", groups)), na.rm = TRUE)),
  groups) ]

CodePudding user response:

This was tricky, finally I think we could do it this way:

library(dplyr)

df %>% 
  mutate(x = cumsum(groups %in% NA) 1) %>% 
  mutate(groups = ifelse(is.na(groups), paste0("group", x 1), groups), .keep="unused")
  player groups
   <dbl> <chr> 
1      1 group1
2      2 group2
3      3 group2
4      4 group3
5      5 group4

CodePudding user response:

You could do:

df |>
  mutate(new_group = max(parse_number(groups), na.rm = TRUE)   cumsum(is.na(groups)),
         groups = if_else(is.na(groups), paste0("group", new_group), groups)) |> 
  select(-new_group)

Using a slightly different data example where after the missings another group is mentioned, this would give you:

Input:

library(tidyverse)
df <- tibble(player=c(1,2,3,4,5,6),groups=c("group1","group2","group2",NA,NA, "group3"))
# A tibble: 6 x 2
  player groups
   <dbl> <chr> 
1      1 group1
2      2 group2
3      3 group2
4      4 NA    
5      5 NA    
6      6 group3

Output:

# A tibble: 6 x 2
  player groups
   <dbl> <chr> 
1      1 group1
2      2 group2
3      3 group2
4      4 group4
5      5 group5
6      6 group3
  • Related