Home > Enterprise >  Replacing NULL with NA in R after Pivot Wider
Replacing NULL with NA in R after Pivot Wider

Time:10-13

I have a huge dataset that I transformed using pivot_wider function. This resulted in a lot of values containing NULL. Now I want to transform those values to NA but none of my approaches seems to work:

df %>% replace(.=="NULL", NA)

df[df==0] <- NA

Also when using the pivot_wider function the values_fill = NA argument doesn't produce any NA's too.

Thanks in advance for your help.

CodePudding user response:

I think you have data which is something like this -

df <- data.frame(a1 = c(1, 1, 1, 1,2), a2 = 1:5, a3 = c('a', 'b', 'c', 'a', 'b'))

df
#  a1 a2 a3
#1  1  1  a
#2  1  2  b
#3  1  3  c
#4  1  4  a
#5  2  5  b

Then you are using pivot_wider which returns

df1 <- tidyr::pivot_wider(df, names_from = a3, values_from = a2)

#     a1   a         b         c        
#  <dbl> <list>    <list>    <list>   
#1     1 <dbl [2]> <dbl [1]> <dbl [1]>
#2     2 <NULL>    <dbl [1]> <NULL>   

This also returns the warning messages -

Warning message: Values are not uniquely identified; output will contain list-cols.

  • Use values_fn = list to suppress this warning.
  • Use values_fn = length to identify where the duplicates arise
  • Use values_fn = {summary_fun} to summarise duplicates

To answer your question about replacing these NULL values with NA you can do -

df1 <- df1 %>% mutate(across(a:c, ~replace(., lengths(.) == 0, NA)))
df1

#    a1    a         b         c        
#  <dbl> <list>    <list>    <list>   
#1     1 <dbl [2]> <dbl [1]> <dbl [1]>
#2     2 <lgl [1]> <dbl [1]> <lgl [1]>

But you should not ignore those warnings and check pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols" if this would be a better option for your data.

  •  Tags:  
  • r
  • Related