I have the following table in R:
What I want to do is group some of the educational attainment values together:
1) group 102-111 --> less than 9th grade
2) group 113-116 --> 9-12th grade no hs degree
3) 201 --> high school diploma
4) 301 --> some college no degree
5) 302-303 --> associate degree
6) 400 --> bachelor degree
7) 501 --> master degree
8) 502 --> professional degree
9) 503 --> doctorate degree (PhD)
How would I go about doing this? Thanks
The dput R output:
structure(c(99, 500, 31, 44, 64, 68, 100, 312, 147, 405, 444,
514, 692, 624, 7055, 986, 6260, 2235, 1761, 6732, 3212, 439,
581, 33305, 39, 207, 10, 21, 28, 18, 33, 120, 51, 178, 211, 267,
320, 214, 2088, 487, 2071, 636, 477, 1213, 493, 71, 76, 9329,
65, 402, 14, 28, 50, 27, 45, 151, 79, 209, 316, 367, 437, 354,
4340, 748, 4186, 1440, 1155, 3824, 1671, 253, 303, 20464, 203,
1109, 55, 93, 142, 113, 178, 583, 277, 792, 971, 1148, 1449,
1192, 13483, 2221, 12517, 4311, 3393, 11769, 5376, 763, 960,
63098), .Dim = c(24L, 4L), .Dimnames = list(EDUC = c("102", "103",
"104", "105", "106", "107", "108", "109", "110", "111", "113",
"114", "115", "116", "201", "202", "301", "302", "303", "400",
"501", "502", "503", "Sum"), DEPFEELEVL = c("1", "2", "3", "Sum"
)), class = c("table", "matrix", "array"))
Want to group similar elements and not just rename them:
DEPFEELEVL
EDUC 1 2 3 Sum
less than 9th grade 99 39 65 203
less than 9th grade 500 207 402 1109
less than 9th grade 31 10 14 55
less than 9th grade 44 21 28 93
less than 9th grade 64 28 50 142
less than 9th grade 68 18 27 113
less than 9th grade 100 33 45 178
less than 9th grade 312 120 151 583
less than 9th grade 147 51 79 277
less than 9th grade 405 178 209 792
9-12th grade no hs degree 444 211 316 971
9-12th grade no hs degree 514 267 367 1148
9-12th grade no hs degree 692 320 437 1449
9-12th grade no hs degree 624 214 354 1192
high school diploma 7055 2088 4340 13483
doctorate degree (PhD) 986 487 748 2221
some college no degree 6260 2071 4186 12517
doctorate degree (PhD) 2235 636 1440 4311
doctorate degree (PhD) 1761 477 1155 3393
bachelor degree 6732 1213 3824 11769
master degree 3212 493 1671 5376
professional degree 439 71 253 763
doctorate degree (PhD) 581 76 303 960
Sum 33305 9329 20464 63098
So for example, let's consider the respondents with the education of 9-12th grade, and for simplicity the depression level 1.
The table should display
444 514 692 624 = 2274
for that particular cell I used as an example
CodePudding user response:
Set groups with breaks defining intervals open on left and closed on right (a, b].
groups <-
c('na' = 0,
'less than 9th' = 111,
'no hs degree' = 116,
'hs diploma' = 201,
'some college' = 301,
'associate' = 303,
'bachelor' = 400,
'master' = 501,
'professional' = 502,
'PhD' = 503)
Starting with the provided data as tbl
. Convert to tibble (data.frame), then define grouping variable using groups above, and summarise.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
tbl %>%
as_tibble() %>%
pivot_wider(EDUC, names_from = DEPFEELEVL, values_from = n,
names_prefix = 'l') %>%
mutate(EDUC = as.numeric(EDUC)) %>%
group_by(educ =
names(groups)[1 cut(as.numeric(EDUC), groups, labels = FALSE)]
) %>%
summarise(across(everything(), sum))
#> Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#> # A tibble: 10 × 6
#> educ EDUC l1 l2 l3 lSum
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 associate 605 3996 1113 2595 7704
#> 2 bachelor 400 6732 1213 3824 11769
#> 3 hs diploma 201 7055 2088 4340 13483
#> 4 less than 9th 1065 1770 705 1070 3545
#> 5 master 501 3212 493 1671 5376
#> 6 no hs degree 458 2274 1012 1474 4760
#> 7 PhD 503 581 76 303 960
#> 8 professional 502 439 71 253 763
#> 9 some college 503 7246 2558 4934 14738
#> 10 <NA> NA 33305 9329 20464 63098
Created on 2021-12-10 by the reprex package (v2.0.1)
CodePudding user response:
Here's a possible idea using base R.
x <- as.numeric(rownames(tbl))
x <-
cut(x, breaks = c(0, 111, 116, 201, 301, 303, 400, 501, 502, 503),
labels = c("less than 9th grade",
"9-12th grade no hs degree",
"high school diploma",
"some college no degree",
"associate degree",
"bachelor degree",
"master degree",
"professional degree",
"doctorate degree (PhD)"))
x <- as.character(x)
x[is.na(x)] <- "Sum"
rownames(tbl) <- x
df <- aggregate(Freq ~ EDUC DEPFEELEVL, data = tbl, FUN = sum)
df <- reshape(data = df, idvar = "EDUC", timevar = "DEPFEELEVL", direction = "wide")
names(df) <- c("EDUC", "DEPFEELEVL_1", "DEPFEELEVL_2", "DEPFEELEVL_3", "DEPFEELEVL_Sum")
output would look like this:
> df
EDUC DEPFEELEVL_1 DEPFEELEVL_2 DEPFEELEVL_3 DEPFEELEVL_Sum
1 less than 9th grade 1770 705 1070 3545
2 9-12th grade no hs degree 2274 1012 1474 4760
3 high school diploma 7055 2088 4340 13483
4 some college no degree 7246 2558 4934 14738
5 associate degree 3996 1113 2595 7704
6 bachelor degree 6732 1213 3824 11769
7 master degree 3212 493 1671 5376
8 professional degree 439 71 253 763
9 doctorate degree (PhD) 581 76 303 960
10 Sum 33305 9329 20464 63098
```