I have a dataset with about 1567 entries that consist of numerical and categorical data. I would like to extract only the categorical data without showing duplicates
df <- data.frame(
aninimal = c('cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog'),
fur_col = c('tan', 'tan', 'tan', 'white', 'black', 'black', 'white', 'brown', 'brown'),
age = c(2, 2, 3, 5, 7, 3, 1, 6, 5))
I used the following code but it gives me the whole list of categories that include duplicates
summary <- df %>%
group by (animal, fur_col) %>%
summarize (animal, fur_col)
it gives me:
anim | fur |
---|---|
cat | tan |
cat | tan |
cat | tan |
cat | white |
dog | black |
dog | black |
the result I want is:
anim | fur |
---|---|
cat | tan |
cat | white |
dog | black |
dog | white |
dog | brown |
CodePudding user response:
Use distinct
:
library(dplyr)
df %>%
distinct(aninimal, fur_col)
aninimal fur_col
1 cat tan
2 cat white
3 dog black
4 dog white
5 dog brown
Or, if you wanna make it dynamic:
distinct(df, across(where(is.character)))
In base R, use unique
:
unique(df[sapply(df, is.character)])
CodePudding user response:
additional solution option
df <- data.frame(
aninimal = c('cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog'),
fur_col = c('tan', 'tan', 'tan', 'white', 'black', 'black', 'white', 'brown', 'brown'),
age = c(2, 2, 3, 5, 7, 3, 1, 6, 5))
library(tidyverse)
df %>%
expand(nesting(aninimal, fur_col))
#> # A tibble: 5 x 2
#> aninimal fur_col
#> <chr> <chr>
#> 1 cat tan
#> 2 cat white
#> 3 dog black
#> 4 dog brown
#> 5 dog white
Created on 2022-08-25 with reprex v2.0.2