Home > Blockchain >  How do I transpose observations into variables/columns?
How do I transpose observations into variables/columns?

Time:11-02

I've got a data frame that looks something like this:

precinct, race, age, people
1001, black, 18-40, 1
1001, white, 18-40, 2
1001, hispanic, 18-40, 3
1001, asian, 18-40, 4
1001, black, 40 or older, 5
1001, white, 40 or older, 6
1001, hispanic, 40 or older, 7
1001, asian, 40 or older, 8

I want to make it look like this:

precinct, black, white, hispanic, asian, 18-40, 40 or older
1001, 6, 8, 10, 12, 10, 26

I've used dcast

dcast(
  data = mydataframe,
  formula = Precinct ~ race   age,
  fun.aggregate = sum,
  value.var = 'people'
)

but this does not produce my desired result.

CodePudding user response:

When we create formula with on the rhs of ~ it creates the combinations between those columns instead of having every single unique element from those columns. In order to have the latter, we may need to melt to long format and then use dcast on the single column (assuming those columns are of the same type)

library(data.table)
dcast(melt(setDT(mydataframe), id.var = c('precinct', 'people')),
    precinct ~ value, fun.aggregate = sum, value.var = 'people')

-output

Key: <precinct>
   precinct 18-40 40 or older asian black hispanic white
      <int> <int>       <int> <int> <int>    <int> <int>
1:     1001    10          26    12     6       10     8

library(dplyr)
library(tidyr)
mydataframe %>% 
  pivot_longer(cols = c(race, age), names_to = NULL) %>% 
  pivot_wider(names_from = value, values_from = people, values_fn = sum)

-output

# A tibble: 1 × 7
  precinct black `18-40` white hispanic asian `40 or older`
     <int> <int>   <int> <int>    <int> <int>         <int>
1     1001     6      10     8       10    12            26

data

mydataframe <- structure(list(precinct = c(1001L, 1001L, 1001L, 1001L, 1001L, 
1001L, 1001L, 1001L), race = c("black", "white", "hispanic", 
"asian", "black", "white", "hispanic", "asian"), age = c("18-40", 
"18-40", "18-40", "18-40", "40 or older", "40 or older", "40 or older", 
"40 or older"), people = 1:8), row.names = c(NA, -8L), 
class = "data.frame")
  • Related