I have a problem counting in R. The each variable has a slightly different spelling like it shows down below
df<-data.frame(sweets= c("cookie", "CANDY", "Cookie", "cake", "IceCream", "Candy", "Chocolate", "COOKIE", "CAKE"))
df
I want to be able to do like this. to do that, I want to change the each variable names to be consistent
df2<-data.frame(sweets= c("Cookie", "Candy", "Cookie", "Cake", "IceCream", "Candy", "Chocolate", "Cookie", "Cake"))
df3<- table(df2)
I used if_else or if...if else function but it was confusing. It would be great if you can write a sample code for how to do it.
CodePudding user response:
Using str_to_title
from stringr
inside mutate
you can "convert case" your variable. After you can use count
to count the number of observation for each sweet.
Code
library(dplyr)
library(stringr)
df <- data.frame(sweets= c("cookie", "CANDY", "Cookie", "cake", "IceCream", "Candy", "Chocolate", "COOKIE", "CAKE"))
df %>%
mutate(sweets = str_to_title(sweets)) %>%
count(sweets)
Output
sweets n
1 Cake 2
2 Candy 2
3 Chocolate 1
4 Cookie 3
5 Icecream 1
CodePudding user response:
Convert all to lowercase then table:
table(tolower(df$sweets))
# cake candy chocolate cookie icecream
# 2 2 1 3 1
Or ?tolower
provides a helper function - capwords:
capwords <- function(s, strict = FALSE) {
cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
table(capwords(df$sweets, strict = TRUE))
# Cake Candy Chocolate Cookie Icecream
# 2 2 1 3 1