I have a dataframe that I would like to reduce in size by extracting the unique observations. However, I would like to only select the unique observations of one column, and preserve the rest of the dataframe. Because there are certain other columns that have repeat values, I cannot simply put the entire dataframe in the unique
function. How can I do this and produce the entire dataframe?
For example, with the following dataframe, I would like to only reduce the dataframe by unique observations of variable a (column 1):
a b c d e
1 2 3 4 5
1 2 3 4 6
3 4 5 6 8
4 5 2 3 6
Therefore, I only remove row 2, because "1" is repeated. The other rows/columns repeat values, but these observations are maintained, because I only assess the uniqueness of column 1 (a).
Desired outcome:
a b c d e
1 2 3 4 5
3 4 5 6 8
4 5 2 3 6
How can I process this and then retrieve the entire dataframe? Is there a configuration for the unique
function to do this, or do I need an alternative?
CodePudding user response:
base R
dat[!duplicated(dat$a),]
# a b c d e
# 1 1 2 3 4 5
# 3 3 4 5 6 8
# 4 4 5 2 3 6
dplyr
dplyr::distinct(dat, a, .keep_all = TRUE)
# a b c d e
# 1 1 2 3 4 5
# 2 3 4 5 6 8
# 3 4 5 2 3 6
Another option: per-group, pick a particular value from the duplicated rows.
library(dplyr)
dat %>%
group_by(a) %>%
slice(which.max(e)) %>%
ungroup()
# # A tibble: 3 x 5
# a b c d e
# <int> <int> <int> <int> <int>
# 1 1 2 3 4 6
# 2 3 4 5 6 8
# 3 4 5 2 3 6
library(data.table)
as.data.table(dat)[, .SD[which.max(e),], by = .(a) ]
# a b c d e
# <int> <int> <int> <int> <int>
# 1: 1 2 3 4 6
# 2: 3 4 5 6 8
# 3: 4 5 2 3 6
As for unique
, it does not have incomparables
argument, but it is not yet implemented:
unique(dat, incomparables = c("b", "c", "d", "e"))
# Error: argument 'incomparables != FALSE' is not used (yet)