I have this table in R.In this table the data in measure column "Zero-Loss Condensate Drain
Zero Loss Condensate Drain, Zero-Loss Condensate Drains " and "Wi-Fi Thermostat
Wi-Fi thermostats " is same but R treats this different and count differently. I want that Wi-Fi Thermostat and Wi-Fi thermostats should be treated same and give count 4 not 1,2,1 respectively. Similar results I want for Zero Zero Loss Condensate Drain.
measure | Freq |
---|---|
Thermostatic Radiator Valves (TRVs) | 45 |
Smart Thermostatic Radiator Enclosure | 42 |
Smart Thermostats | 4 |
Thermostatic radiator valves | 3 |
Wi-Fi Enabled Thermostats | 2 |
Wi-Fi Thermostats | 1 |
Smart Thermostat | 2 |
Thermostatic and Float Steam Traps | 1 |
Thermostatic Radiator Valves | 2 |
Dual Fuel Thermostat | 1 |
Programmable Setback Thermostats | 1 |
Wi-Fi Thermostat | 1 |
Wi-Fi thermostats | 2 |
Zero-Loss Condensate Drain | 1 |
Zero Loss Condensate Drain | 1 |
Zero-Loss Condensate Drains | 2 |
CodePudding user response:
You need to tidy up your measure
values before summarizing:
library(tidyverse)
df %>%
# tidy up values in `measure`:
mutate(
# get rid of plural -s:
measure = str_replace(measure, "(?<=thermostat|Drain)s", ""),
# capitalize "thermostat"
measure = str_replace(measure, "thermostat", "Thermostat"),
# remove hyphen:
measure = str_replace(measure, "(?<=Zero)-(?=Loss)", " ")) %>%
# for each `measure` value...:
group_by(measure) %>%
# ...give frequency:
summarise(Frequ = n())
# A tibble: 2 × 2
measure Frequ
<chr> <int>
1 Wi-Fi Thermostat 2
2 Zero Loss Condensate Drain 3
Data:
df <- data.frame(
measure = c("Wi-Fi Thermostat", "Wi-Fi thermostats",
"Zero-Loss Condensate Drain","Zero Loss Condensate Drain","Zero-Loss Condensate Drains")
)
CodePudding user response:
For this example we could do:
library(dplyr)
library(stringr)
df %>%
mutate(helper = toupper(measure),
helper = ifelse(str_ends(helper, 'S'), substring(helper,1, nchar(helper)-1), helper),
helper = str_replace(helper, '\\-', ' ')) %>%
group_by(helper) %>%
mutate(measure = first(measure)) %>%
group_by(measure) %>%
summarise(Freq = sum(Freq)) %>%
arrange(-Freq)
measure Freq
<chr> <dbl>
1 Thermostatic Radiator Valves (TRVs) 45
2 Smart Thermostatic Radiator Enclosure 42
3 Smart Thermostats 6
4 Thermostatic radiator valves 5
5 Wi-Fi Thermostats 4
6 Zero-Loss Condensate Drain 4
7 Wi-Fi Enabled Thermostats 2
8 Dual Fuel Thermostat 1
9 Programmable Setback Thermostats 1
10 Thermostatic and Float Steam Traps 1