I want to apply the distinct
function to many variables. So lets say I have the dataframe…
df <- data.frame(
id = c(1,1,1,2,3,3),
`sitting position` = c("A","B","A","A","B","B"),
`movement haed` = c("left", "left", "right", "right", "left", "left"),
`colesterol level` = c(50, 30, 45, 80, 90, 130),
check.names = FALSE)
…Now I put those variables in a list for which I want to apply the distinct function (I have more variables in my dataframe). Let‘s say this is the list:
columns <- dput(colnames(df))[-3]
Output:
c("id", "sitting position", "colesterol level"
)
Is there a way to apply columns
with the distinct
function directly (something like distinct(df, columns)
, which unfortunately doesn't work)? Or do I always have to type the variables one by one, like
df_new <- distinct(df, id, `sitting position`, `colesterol level`)
Output:
df_new
id sitting position colesterol level
1 1 A 50
2 1 B 30
3 1 A 45
4 2 A 80
5 3 B 90
6 3 B 130
>
which does work, but would cost too much time. If I apply columns
directly I always get an error message and I don‘t really know how to solve this problem.
Thank you very much for your help!
CodePudding user response:
We can make use of tidyverse
's distinct_all
here. The nice part of this function is that you can specify further which variables should be included by using the .funs
argument. Because *_all
is superseded, I have included an across
version.
library(dplyr)
# using the columns variable
df %>%
distinct(across(all_of(columns)))
id sitting position colesterol level
1 1 A 50
2 1 B 30
3 1 A 45
4 2 A 80
5 3 B 90
6 3 B 130
dplyr::distinct_all(df)
#or
df %>%
distinct(across(.cols = everything()))
id sitting position movement haed colesterol level
1 1 A left 50
2 1 B left 30
3 1 A right 45
4 2 A right 80
5 3 B left 90
6 3 B left 130
or if you want to select certain variables
df %>%
distinct_all() %>%
select(id, `sitting position`, `colesterol level`)
id sitting position colesterol level
1 1 A 50
2 1 B 30
3 1 A 45
4 2 A 80
5 3 B 90
6 3 B 130