I have a huge dataset that I transformed using pivot_wider function. This resulted in a lot of values containing NULL. Now I want to transform those values to NA but none of my approaches seems to work:
df %>% replace(.=="NULL", NA)
df[df==0] <- NA
Also when using the pivot_wider function the values_fill = NA argument doesn't produce any NA's too.
Thanks in advance for your help.
CodePudding user response:
I think you have data which is something like this -
df <- data.frame(a1 = c(1, 1, 1, 1,2), a2 = 1:5, a3 = c('a', 'b', 'c', 'a', 'b'))
df
# a1 a2 a3
#1 1 1 a
#2 1 2 b
#3 1 3 c
#4 1 4 a
#5 2 5 b
Then you are using pivot_wider
which returns
df1 <- tidyr::pivot_wider(df, names_from = a3, values_from = a2)
# a1 a b c
# <dbl> <list> <list> <list>
#1 1 <dbl [2]> <dbl [1]> <dbl [1]>
#2 2 <NULL> <dbl [1]> <NULL>
This also returns the warning messages -
Warning message: Values are not uniquely identified; output will contain list-cols.
- Use
values_fn = list
to suppress this warning.- Use
values_fn = length
to identify where the duplicates arise- Use
values_fn = {summary_fun}
to summarise duplicates
To answer your question about replacing these NULL
values with NA
you can do -
df1 <- df1 %>% mutate(across(a:c, ~replace(., lengths(.) == 0, NA)))
df1
# a1 a b c
# <dbl> <list> <list> <list>
#1 1 <dbl [2]> <dbl [1]> <dbl [1]>
#2 2 <lgl [1]> <dbl [1]> <lgl [1]>
But you should not ignore those warnings and check pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols" if this would be a better option for your data.