I have a df that is only looking at 1 ID with their respective assets:
ID | Asset | CONF_1 | CONF_2 | CONF_3 |
1 A PERFECT HIGH LOW
1 B PERFECT LOW LOW
1 C LOW HIGH VERY LOW
1 D NA MEDIUM MEDIUM
1 E MEDIUM MEDIUM PERFECT
1 F MEDIUM VERY LOW NA
1 G VERY LOW VERY LOW VERY LOW
1 H NA PERFECT HIGH
The goal is to reorganize the df so I can break down the % of each confidence level (PERFECT, HIGH, MEDIUM, ETC) given each ID and the 3 Conf fields.
Desired output
ID | CONFIDENCE | CONF_1 % | CONF_2 % | CONF_3 %
1 PERFECT 25 % 12.5 % 12.5 %
1 HIGH 0 25 % 12.5 %
1 MEDIUM 25 % 25 % 12.5 %
1 LOW 12.5 % 12.5 % 25 %
1 VERY LOW 12.5 % 25 % 25 %
1 NA 25 % 0 12.5 %
CodePudding user response:
Grouped by 'ID', summarise
across
the 'CONF' columns, get the frequency count with table
on a factor
column with levels
specified in the order, and find the proportions
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(lvls = c("PERFECT", "HIGH", "MEDIUM", "LOW", "VERY LOW", NA),
across(starts_with("CONF"),
~ 100 * proportions(table(factor(., levels = na.omit(lvls)),
useNA = "always"))), .groups = 'drop') %>%
rename(CONFIDENCE = lvls)
-output
# A tibble: 6 × 5
ID CONFIDENCE CONF_1 CONF_2 CONF_3
<int> <chr> <table> <table> <table>
1 1 PERFECT 25.0 12.5 12.5
2 1 HIGH 0.0 25.0 12.5
3 1 MEDIUM 25.0 25.0 12.5
4 1 LOW 12.5 12.5 25.0
5 1 VERY LOW 12.5 25.0 25.0
6 1 <NA> 25.0 0.0 12.5
--
Or another option is to reshape to 'long' format with pivot_longer
, do the count
and reshape back to 'wide' format with pivot_wider
library(tidyr)
df1 %>%
select(-Asset) %>%
pivot_longer(cols = starts_with("CONF"), values_to = 'CONFIDENCE') %>%
count(ID, name, CONFIDENCE) %>%
group_by(ID, name) %>%
mutate(n = 100 *n/sum(n) ) %>%
ungroup %>%
pivot_wider(names_from = name, values_from = n, values_fill = 0)
-output
# A tibble: 6 × 5
ID CONFIDENCE CONF_1 CONF_2 CONF_3
<int> <chr> <dbl> <dbl> <dbl>
1 1 LOW 12.5 12.5 25
2 1 MEDIUM 25 25 12.5
3 1 PERFECT 25 12.5 12.5
4 1 VERY LOW 12.5 25 25
5 1 <NA> 25 0 12.5
6 1 HIGH 0 25 12.5
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Asset = c("A",
"B", "C", "D", "E", "F", "G", "H"), CONF_1 = c("PERFECT", "PERFECT",
"LOW", NA, "MEDIUM", "MEDIUM", "VERY LOW", NA), CONF_2 = c("HIGH",
"LOW", "HIGH", "MEDIUM", "MEDIUM", "VERY LOW", "VERY LOW", "PERFECT"
), CONF_3 = c("LOW", "LOW", "VERY LOW", "MEDIUM", "PERFECT",
NA, "VERY LOW", "HIGH")), class = "data.frame", row.names = c(NA,
-8L))