Home > database >  R: distinct function with a list
R: distinct function with a list

Time:10-30

I want to apply the distinct function to many variables. So lets say I have the dataframe…

df <- data.frame(
  id = c(1,1,1,2,3,3),
  `sitting position` = c("A","B","A","A","B","B"),
  `movement haed` = c("left", "left", "right", "right", "left", "left"),
  `colesterol level` = c(50, 30, 45, 80, 90, 130),
  check.names = FALSE)

…Now I put those variables in a list for which I want to apply the distinct function (I have more variables in my dataframe). Let‘s say this is the list:

columns <- dput(colnames(df))[-3]

Output:
c("id", "sitting position", "colesterol level"
)

Is there a way to apply columns with the distinct function directly (something like distinct(df, columns), which unfortunately doesn't work)? Or do I always have to type the variables one by one, like

df_new <- distinct(df, id, `sitting position`, `colesterol level`)

Output:
df_new
  id sitting position colesterol level
1  1                A               50
2  1                B               30
3  1                A               45
4  2                A               80
5  3                B               90
6  3                B              130
> 

which does work, but would cost too much time. If I apply columns directly I always get an error message and I don‘t really know how to solve this problem.

Thank you very much for your help!

CodePudding user response:

We can make use of tidyverse's distinct_all here. The nice part of this function is that you can specify further which variables should be included by using the .funs argument. Because *_all is superseded, I have included an across version.

library(dplyr)

# using the columns variable
df %>%
  distinct(across(all_of(columns)))

  id sitting position colesterol level
1  1                A               50
2  1                B               30
3  1                A               45
4  2                A               80
5  3                B               90
6  3                B              130

dplyr::distinct_all(df)

#or

df %>%
  distinct(across(.cols = everything()))

  id sitting position movement haed colesterol level
1  1                A          left               50
2  1                B          left               30
3  1                A         right               45
4  2                A         right               80
5  3                B          left               90
6  3                B          left              130

or if you want to select certain variables

df %>%
  distinct_all() %>%
  select(id, `sitting position`, `colesterol level`)
  id sitting position colesterol level
1  1                A               50
2  1                B               30
3  1                A               45
4  2                A               80
5  3                B               90
6  3                B              130
  • Related