I have sample data
s_name <- c("AL", "AL", "CA", "CA", "WI", "WI", "NJ", "NJ", "UT", "UT")
n_unit <- c(40, 30, 150, 110, 45, 80, 70, 40, 50, 90)
li_unit <- c(30, 30, 70, 40, 15, 80, 50, 40, 45, 45)
pv_lvl <- c("High", "Very High", "Medium", "Low", "Very Low", "Medium", "Very High", "low", "Very Low", "High")
mydata <- as.data.frame(cbind(s_name, n_unit, li_unit, pv_lvl))
mydata$n_unit <- as.numeric(mydata$n_unit)
mydata$li_unit <- as.numeric(mydata$li_unit)
mydata$per_li = mydata$li_unit/mydata$n_unit*100
print(mydata)
What I am trying to generate is a table that shows the percent of li_unit
in each type of pv_lvl
grouped by s_name
.
Something like this:
s_name Very Low Low Medium High Very High
AL 0.00 0.00 0.00 75.00 100.00
CA 0.00 36.36 46.67 0.00 0.00
WI 33.33 0.00 100.00 0.00 0.00
NJ 100.00 0.00 0.00 0.00 71.43
UT 90.00 0.00 0.00 50.00 0.00
I tried the group_by
function but did not get the desired result.
Thanks for your time and help!
CodePudding user response:
Using reshape2
library(reshape2)
dcast(mydata,s_name~pv_lvl,value.var="per_li",mean)
s_name High low Low Medium Very High Very Low
1 AL 75 NaN NaN NaN 100.00000 NaN
2 CA NaN NaN 36.36364 46.66667 NaN NaN
3 NJ NaN 100 NaN NaN 71.42857 NaN
4 UT 50 NaN NaN NaN NaN 90.00000
5 WI NaN NaN NaN 100.00000 NaN 33.33333
Note: you should fix some pv_lvl values (lower/uppercase), also you can factor it to have it in a specific order.
CodePudding user response:
Here's the standard pivot_wider
way:
library(tidyr)
mydata$pv_lvl <- factor(mydata$pv_lvl, levels = c("Very Low", "Low", "Medium", "High", "Very High"))
pivot_wider(mydata, s_name,
names_from = pv_lvl,
values_from = per_li,
values_fill = 0,
names_sort = TRUE)
output
# A tibble: 5 × 6
s_name `Very Low` Low Medium High `Very High`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AL 0 0 0 75 100
2 CA 0 36.4 46.7 0 0
3 WI 33.3 0 100 0 0
4 NJ 0 100 0 0 71.4
5 UT 90 0 0 50 0
Note that you don't have to have to use cbind
to create the dataframe,
mydata <- data.frame(s_name, n_unit, li_unit, pv_lvl)
is enough. Also, I created a factor out of pv_lvl
so that you can sort them as desired using names_sort
in pivot_wider
.