Home > database >  Summaries categorical data in r
Summaries categorical data in r

Time:08-26

I have a dataset with about 1567 entries that consist of numerical and categorical data. I would like to extract only the categorical data without showing duplicates

df <- data.frame(
  aninimal = c('cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog'),
  fur_col = c('tan', 'tan', 'tan', 'white', 'black', 'black', 'white', 'brown', 'brown'),
  age = c(2, 2, 3, 5, 7, 3, 1, 6, 5))

I used the following code but it gives me the whole list of categories that include duplicates

 summary <- df %>% 
group by (animal, fur_col) %>% 
summarize (animal, fur_col)

it gives me:

anim fur
cat tan
cat tan
cat tan
cat white
dog black
dog black

the result I want is:

anim fur
cat tan
cat white
dog black
dog white
dog brown

CodePudding user response:

Use distinct:

library(dplyr)
df %>% 
  distinct(aninimal, fur_col)

  aninimal fur_col
1      cat     tan
2      cat   white
3      dog   black
4      dog   white
5      dog   brown

Or, if you wanna make it dynamic:

distinct(df, across(where(is.character)))

In base R, use unique:

unique(df[sapply(df, is.character)])

CodePudding user response:

additional solution option

df <- data.frame(
  aninimal = c('cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog'),
  fur_col = c('tan', 'tan', 'tan', 'white', 'black', 'black', 'white', 'brown', 'brown'),
  age = c(2, 2, 3, 5, 7, 3, 1, 6, 5))

library(tidyverse)
df %>% 
  expand(nesting(aninimal, fur_col))
#> # A tibble: 5 x 2
#>   aninimal fur_col
#>   <chr>    <chr>  
#> 1 cat      tan    
#> 2 cat      white  
#> 3 dog      black  
#> 4 dog      brown  
#> 5 dog      white

Created on 2022-08-25 with reprex v2.0.2

  • Related