Home > Software design >  Grouping Column Values in R (table function)
Grouping Column Values in R (table function)

Time:12-11

I have the following table in R:

table image

What I want to do is group some of the educational attainment values together:

1) group 102-111 --> less than 9th grade
2) group 113-116 --> 9-12th grade no hs degree
3) 201 --> high school diploma
4) 301 --> some college no degree
5) 302-303 --> associate degree
6) 400 --> bachelor degree
7) 501 --> master degree
8) 502 --> professional degree
9) 503 --> doctorate degree (PhD)

How would I go about doing this? Thanks

The dput R output:

structure(c(99, 500, 31, 44, 64, 68, 100, 312, 147, 405, 444, 
514, 692, 624, 7055, 986, 6260, 2235, 1761, 6732, 3212, 439, 
581, 33305, 39, 207, 10, 21, 28, 18, 33, 120, 51, 178, 211, 267, 
320, 214, 2088, 487, 2071, 636, 477, 1213, 493, 71, 76, 9329, 
65, 402, 14, 28, 50, 27, 45, 151, 79, 209, 316, 367, 437, 354, 
4340, 748, 4186, 1440, 1155, 3824, 1671, 253, 303, 20464, 203, 
1109, 55, 93, 142, 113, 178, 583, 277, 792, 971, 1148, 1449, 
1192, 13483, 2221, 12517, 4311, 3393, 11769, 5376, 763, 960, 
63098), .Dim = c(24L, 4L), .Dimnames = list(EDUC = c("102", "103", 
"104", "105", "106", "107", "108", "109", "110", "111", "113", 
"114", "115", "116", "201", "202", "301", "302", "303", "400", 
"501", "502", "503", "Sum"), DEPFEELEVL = c("1", "2", "3", "Sum"
)), class = c("table", "matrix", "array"))

Want to group similar elements and not just rename them:

                           DEPFEELEVL
EDUC                            1     2     3   Sum
  less than 9th grade          99    39    65   203
  less than 9th grade         500   207   402  1109
  less than 9th grade          31    10    14    55
  less than 9th grade          44    21    28    93
  less than 9th grade          64    28    50   142
  less than 9th grade          68    18    27   113
  less than 9th grade         100    33    45   178
  less than 9th grade         312   120   151   583
  less than 9th grade         147    51    79   277
  less than 9th grade         405   178   209   792
  9-12th grade no hs degree   444   211   316   971
  9-12th grade no hs degree   514   267   367  1148
  9-12th grade no hs degree   692   320   437  1449
  9-12th grade no hs degree   624   214   354  1192
  high school diploma        7055  2088  4340 13483
  doctorate degree (PhD)      986   487   748  2221
  some college no degree     6260  2071  4186 12517
  doctorate degree (PhD)     2235   636  1440  4311
  doctorate degree (PhD)     1761   477  1155  3393
  bachelor degree            6732  1213  3824 11769
  master degree              3212   493  1671  5376
  professional degree         439    71   253   763
  doctorate degree (PhD)      581    76   303   960
  Sum                       33305  9329 20464 63098

So for example, let's consider the respondents with the education of 9-12th grade, and for simplicity the depression level 1.

The table should display

444 514 692 624 = 2274

for that particular cell I used as an example

CodePudding user response:

Set groups with breaks defining intervals open on left and closed on right (a, b].

groups <- 
  c('na' = 0, 
    'less than 9th' = 111, 
    'no hs degree' = 116, 
    'hs diploma' = 201, 
    'some college' = 301, 
    'associate' = 303, 
    'bachelor' = 400,
    'master' = 501,
    'professional' = 502,
    'PhD' = 503)
    

Starting with the provided data as tbl. Convert to tibble (data.frame), then define grouping variable using groups above, and summarise.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)


tbl %>% 
  as_tibble() %>% 
  pivot_wider(EDUC, names_from = DEPFEELEVL, values_from = n, 
              names_prefix = 'l') %>% 
  mutate(EDUC = as.numeric(EDUC)) %>% 
  group_by(educ =  
    names(groups)[1   cut(as.numeric(EDUC), groups, labels = FALSE)]
  ) %>% 
  summarise(across(everything(), sum))
#> Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#> # A tibble: 10 × 6
#>    educ           EDUC    l1    l2    l3  lSum
#>    <chr>         <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 associate       605  3996  1113  2595  7704
#>  2 bachelor        400  6732  1213  3824 11769
#>  3 hs diploma      201  7055  2088  4340 13483
#>  4 less than 9th  1065  1770   705  1070  3545
#>  5 master          501  3212   493  1671  5376
#>  6 no hs degree    458  2274  1012  1474  4760
#>  7 PhD             503   581    76   303   960
#>  8 professional    502   439    71   253   763
#>  9 some college    503  7246  2558  4934 14738
#> 10 <NA>             NA 33305  9329 20464 63098

Created on 2021-12-10 by the reprex package (v2.0.1)

CodePudding user response:

Here's a possible idea using base R.

x <- as.numeric(rownames(tbl))
x <- 
  cut(x, breaks = c(0, 111, 116, 201, 301, 303, 400, 501, 502, 503),
         labels = c("less than 9th grade",
                    "9-12th grade no hs degree",
                    "high school diploma",
                    "some college no degree",
                    "associate degree",
                    "bachelor degree",
                    "master degree",
                    "professional degree",
                    "doctorate degree (PhD)"))
x <- as.character(x)
x[is.na(x)] <- "Sum"
rownames(tbl) <- x
df <- aggregate(Freq ~ EDUC   DEPFEELEVL, data = tbl, FUN = sum)
df <- reshape(data = df, idvar = "EDUC", timevar = "DEPFEELEVL", direction = "wide")
names(df) <- c("EDUC", "DEPFEELEVL_1", "DEPFEELEVL_2", "DEPFEELEVL_3", "DEPFEELEVL_Sum")

output would look like this:

> df
                        EDUC DEPFEELEVL_1 DEPFEELEVL_2 DEPFEELEVL_3 DEPFEELEVL_Sum
1        less than 9th grade         1770          705         1070           3545
2  9-12th grade no hs degree         2274         1012         1474           4760
3        high school diploma         7055         2088         4340          13483
4     some college no degree         7246         2558         4934          14738
5           associate degree         3996         1113         2595           7704
6            bachelor degree         6732         1213         3824          11769
7              master degree         3212          493         1671           5376
8        professional degree          439           71          253            763
9     doctorate degree (PhD)          581           76          303            960
10                       Sum        33305         9329        20464          63098
```
  •  Tags:  
  • r
  • Related