Home > Blockchain >  How do I mutate all columns in a data set to keep only distinct values?
How do I mutate all columns in a data set to keep only distinct values?

Time:04-05

Considering the data mtcars

something like

mtcars %>% select(mpg,cyl,disp,hp) %>% mutate_all(distinct())

I want to have all the distinct values only, I understand this will make the length of data- frame column unequal, so I wanted to also know if we can insert NAs for that?

in short, I want to apply unique() across all columns separately without having to write something like unique(mtcars$cyl) for each of the rows. This will make the length of df unequal,

CodePudding user response:

A base solution:

lapply(mtcars, unique)

Here, unique() accepts a vector x and returns a (possibly shorter) vector consisting of the unique values. As you noted, the lengths of each unique collection will differ, so we use lapply() to obtain the answer as a list.

Given what I think you're trying to do, this might be a more sensible approach than padding NA entries, because it seems like the only thing you want is the list of unique values.

CodePudding user response:

If I understand correct you are looking for this: To achieve your aim first transform the dataframe columns to list of vectors. Then replace the duplicates with NA to get the same length and wrap it around map_dfr:

library(tidyverse)
mtcars %>% 
  dplyr::select(mpg,cyl,disp,hp) %>% 
  as.list() %>% 
  map_dfr(~replace(., duplicated(.), NA))
    mpg   cyl  disp    hp
   <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110
 2  NA      NA   NA     NA
 3  22.8     4  108     93
 4  21.4    NA  258     NA
 5  18.7     8  360    175
 6  18.1    NA  225    105
 7  14.3    NA   NA    245
 8  24.4    NA  147.    62
 9  NA      NA  141.    95
10  19.2    NA  168.   123
# ... with 22 more rows
  • Related