Home > Software engineering >  How to find rows in a dataframe that show unique results across multiple columns
How to find rows in a dataframe that show unique results across multiple columns

Time:11-09

This might be a simple answer, but I am having issues finding this solution and could use help, please.

> fruit.names <- c(rep("apple",3), rep("pear",3), rep("pepper", 3), rep("rice",3))
> adj <- c(rep("red", 3), rep("not round", 2), "yellow", rep("hot", 3), "grain", "white", "starch")
> df.start <- data.frame(fruit.names, adj)
> df.start
   fruit.names       adj
1        apple       red
2        apple       red
3        apple       red
4         pear not round
5         pear not round
6         pear    yellow
7       pepper       hot
8       pepper       hot
9       pepper       hot
10        rice     grain
11        rice     white
12        rice    starch

I am need of code that results that list only unique df.start$names and has all the same results in df.start$adj for each item in df.start$names.

So the results would look like this. I'd prefer to use only base R, if possible (i.e. no tidyr/dplyr.)

> df.results
 fruit.names     adj
1   apple        red
2   pepper       hot

CodePudding user response:

A couple ways:

base R

ind <- ave(df.start$adj, df.start$fruit.names, FUN = function(z) length(unique(z)) == 1) == "TRUE"
unique(df.start[ind,])
#   fruit.names adj
# 1       apple red
# 7      pepper hot

The need to check against the string "TRUE" is because ave requires that its return value is the same class as the input vector, so the output is coerced.

dplyr

(Offered for the crowd, though I know you said you preferred base R.)

library(dplyr)
df.start %>%
  group_by(fruit.names) %>%
  filter(length(unique(adj)) == 1) %>%
  ungroup() %>%
  distinct()
# # A tibble: 2 x 2
#   fruit.names adj  
#   <chr>       <chr>
# 1 apple       red  
# 2 pepper      hot  
  • Related