Suppose, I have a data set as follows:
State <- c("CA", "WI", "TX", "MS", "NY", "KT", "UT", "CO", "PA", "SC")
Pov_rt <- c(25, 30, 35, 40, 45, 50, 10, 15, 25, 40)
df <- data. Frame(State, Pov_rt)
I have created a new column
df[["pov_level"]] <- cut(
df$Pov_rt,
breaks = c(-Inf, 10, 20, 30, 40, Inf),
labels = c(
"Very Low Poverty (<10%)",
"Low Poverty (10-20%)",
"Medium Poverty (20-30%)",
"High Poverty (30-40%)",
"Very High Poverty (>40%)"
)
)
I want to, further, get a table
table(df$pov_level, df$State)
CA CO KT MS NY PA SC TX UT WI
Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1
High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0
Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0
If I want to convert this table as a dataframe, I go about it as
x <- data.frame(unclass(table(df$pov_level, df$State)))
instead of x <- as.data.frame(table(df$pov_level, df$State))
since this gives me a dataframe of an ungrouped list.
But using the first one has an issue where sometimes I'd need to mutate
the rownames as a new column.
I was wondering if there is a better way of doing this.
Much thanks!
CodePudding user response:
We could use as.data.frame.matrix
df2 <- as.data.frame.matrix(table(df$pov_level, df$State))
-output
> df2
CA CO KT MS NY PA SC TX UT WI
Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1
High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0
Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0
> str(df2)
'data.frame': 5 obs. of 10 variables:
$ CA: int 0 0 1 0 0
$ CO: int 0 1 0 0 0
$ KT: int 0 0 0 0 1
$ MS: int 0 0 0 1 0
$ NY: int 0 0 0 0 1
$ PA: int 0 0 1 0 0
$ SC: int 0 0 0 1 0
$ TX: int 0 0 0 1 0
$ UT: int 1 0 0 0 0
$ WI: int 0 0 1 0 0
Column can be created from rownames with rownames_to_column
(tibble
)
library(tibble)
library(dplyr)
df2 %>%
rownames_to_column("pov_levels")
pov_levels CA CO KT MS NY PA SC TX UT WI
1 Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
2 Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
3 Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1
4 High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0
5 Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0
Or if we want to use tidyverse
library(tidyr)
pivot_wider(df, names_from = State, values_from = Pov_rt,
values_fn = length, values_fill = 0)
# A tibble: 5 × 11
pov_level CA WI TX MS NY KT UT CO PA SC
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 Medium Poverty (20-30%) 1 1 0 0 0 0 0 0 1 0
2 High Poverty (30-40%) 0 0 1 1 0 0 0 0 0 1
3 Very High Poverty (>40%) 0 0 0 0 1 1 0 0 0 0
4 Very Low Poverty (<10%) 0 0 0 0 0 0 1 0 0 0
5 Low Poverty (10-20%) 0 0 0 0 0 0 0 1 0 0
Or more directly with tabyl
library(janitor)
df %>%
tabyl(pov_level, State)
pov_level CA CO KT MS NY PA SC TX UT WI
Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1
High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0
Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0
CodePudding user response:
Here is an alternative approach with model.matrix
:
library(dplyr)
df %>%
cbind(model.matrix(~ State 0, .) == 1) %>%
mutate(across(-c(1:3), ~as.integer(.)))
State Pov_rt pov_level StateCA StateCO StateKT StateMS StateNY StatePA StateSC StateTX StateUT StateWI
1 CA 25 Medium Poverty (20-30%) 1 0 0 0 0 0 0 0 0 0
2 WI 30 Medium Poverty (20-30%) 0 0 0 0 0 0 0 0 0 1
3 TX 35 High Poverty (30-40%) 0 0 0 0 0 0 0 1 0 0
4 MS 40 High Poverty (30-40%) 0 0 0 1 0 0 0 0 0 0
5 NY 45 Very High Poverty (>40%) 0 0 0 0 1 0 0 0 0 0
6 KT 50 Very High Poverty (>40%) 0 0 1 0 0 0 0 0 0 0
7 UT 10 Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
8 CO 15 Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
9 PA 25 Medium Poverty (20-30%) 0 0 0 0 0 1 0 0 0 0
10 SC 40 High Poverty (30-40%) 0 0 0 0 0 0 1 0 0 0