I am a beginner in R, and I need to learn how to perform code. As you can see in my data frame, I want to check whether the egg in column commodity has the same unit in all rows.
data frame:
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
commodity unit
1 eggs 1.8 kg
2 lentils (green) 900 g
3 oil (vegetable) 810 g
4 rice kg
5 sugar (white) kg
6 eggs 1.8 kg
7 lentils (green) 900 g
8 oil (vegetable) 810 g
9 rice kg
10 sugar (white) kg
11 eggs 1.8 kg
I do not know what I should do
CodePudding user response:
One way could be:
First create a column with your units extracting only alphabetic letters, then use distinct()
:
library(dplyr)
df %>%
mutate(unit1 = gsub("[^a-zA-Z]", "", unit)) %>%
distinct(unit1)
unit1
1 kg
2 g
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
CodePudding user response:
In base R
, we could use
length(unique(trimws(df$unit, whitespace = "[0-9.] \\s "))) == 1
[1] FALSE