Home > Software engineering >  How do I count variable when it has slightly different way of spelling?
How do I count variable when it has slightly different way of spelling?

Time:12-16

I have a problem counting in R. The each variable has a slightly different spelling like it shows down below

df<-data.frame(sweets= c("cookie", "CANDY", "Cookie", "cake", "IceCream", "Candy", "Chocolate", "COOKIE", "CAKE"))
df

I want to be able to do like this. to do that, I want to change the each variable names to be consistent

df2<-data.frame(sweets= c("Cookie", "Candy", "Cookie", "Cake", "IceCream", "Candy", "Chocolate", "Cookie", "Cake"))               
df3<- table(df2)

I used if_else or if...if else function but it was confusing. It would be great if you can write a sample code for how to do it.

CodePudding user response:

Using str_to_title from stringr inside mutate you can "convert case" your variable. After you can use count to count the number of observation for each sweet.

Code

library(dplyr)
library(stringr)
   

df <- data.frame(sweets= c("cookie", "CANDY", "Cookie", "cake", "IceCream", "Candy", "Chocolate", "COOKIE", "CAKE"))

df %>% 
  mutate(sweets = str_to_title(sweets)) %>%
  count(sweets)

Output

     sweets n
1      Cake 2
2     Candy 2
3 Chocolate 1
4    Cookie 3
5  Icecream 1

CodePudding user response:

Convert all to lowercase then table:

table(tolower(df$sweets))
# cake     candy chocolate    cookie  icecream 
#    2         2         1         3         1

Or ?tolower provides a helper function - capwords:

capwords <- function(s, strict = FALSE) {
  cap <- function(s) paste(toupper(substring(s, 1, 1)),
                           {s <- substring(s, 2); if(strict) tolower(s) else s},
                           sep = "", collapse = " " )
  sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}

table(capwords(df$sweets, strict = TRUE))
# Cake     Candy Chocolate    Cookie  Icecream 
#    2         2         1         3         1 
  • Related