How can I create a columns with the values in other column (R)?-CodePudding

I would like to create a table that assigns a column to each value of one column.

The data looks like this:


   Person       Task      

   John          4        
   Michael       1           
   Florence      3

Expected result:

  Person       Task     1     2     3      4     5    6    7    8   

   John          4      1     1     1      1     0    0    0    0  
   Michael       1      0     0     0      0     1    0    0    0     
   Florence      3      0     0     0      0     0    1    1    1

It's important that the column values are filled in an orderly manner. The first row, then the second and so on.

Thanks!

CodePudding user response：

in base R:

cbind(df, t(unname(model.matrix(~with(df, factor(rep(Person, Task), Person))-1))))

    Person Task 1 2 3 4 5 6 7 8
1     John    4 1 1 1 1 0 0 0 0
2  Michael    1 0 0 0 0 1 0 0 0
3 Florence    3 0 0 0 0 0 1 1 1

For easier code:

Create a dataframe that lookes like below:

df1 <- with(df, data.frame(lengths = Task, values = factor(Person, Person)))
df1
  lengths   values
1       4     John
2       1  Michael
3       3 Florence

Note that values is now a factor column with levels same as the values.

Then you could simply do:

cbind(df, t(unname(model.matrix(~inverse.rle(df1)-1))))
    Person Task 1 2 3 4 5 6 7 8
1     John    4 1 1 1 1 0 0 0 0
2  Michael    1 0 0 0 0 1 0 0 0
3 Florence    3 0 0 0 0 0 1 1 1

CodePudding user response：

Setting up your dataframe:

> df <- data.frame(Name=factor(c("John", "Michael", "Florence"), levels=c("John", "Michael", "Florence")), Task=c(4,1,3))
> df
      Name Task
1     John    4
2  Michael    1
3 Florence    3

First I will make a 'long' dataframe, expanding each name and task by the number of entries needed. The id will make sure that when I reshape the dataframe wide the columns have the correct names:

df2 <- data.frame(Name=rep(df$Name, df$Task),
           Task=rep(df$Task, df$Task),
           id = 1:sum(df$Task))
> df2
      Name Task id
1     John    4  1
2     John    4  2
3     John    4  3
4     John    4  4
5  Michael    1  5
6 Florence    3  6
7 Florence    3  7
8 Florence    3  8

Now I can reshape wide using the dcast function from reshape2

reshape2::dcast(df2, Name Task ~ id, fun.aggregate = length, value.var="id")
      Name Task 1 2 3 4 5 6 7 8
1     John    4 1 1 1 1 0 0 0 0
2  Michael    1 0 0 0 0 1 0 0 0
3 Florence    3 0 0 0 0 0 1 1 1

CodePudding user response：

A solution with diag(), since the added values have the shape of a diagonal matrix, just with repeated columns:

n <- length(dat$Task)
cbind( dat, matrix( unlist( apply( rbind(as.integer(dat$Task), 1:n), 2, 
  function(x) rep(diag(n)[,x[2]], x[1]) ) ), n ) )

    Person Task 1 2 3 4 5 6 7 8
1     John    4 1 1 1 1 0 0 0 0
2  Michael    1 0 0 0 0 1 0 0 0
3 Florence    3 0 0 0 0 0 1 1 1

Data:

dat <- structure(list(Person = c("John", "Michael", "Florence"), Task = c(4L,
1L, 3L)), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

You could use

library(dplyr)
library(tidyr)

df %>% 
  uncount(Task, .remove = FALSE) %>% 
  mutate(rn = row_number(),
         value = 1) %>% 
  pivot_wider(c(Person, Task), 
              names_from = rn, 
              values_from = value,
              values_fill = 0)

This returns

# A tibble: 3 x 10
  Person    Task   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`
  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 John         4     1     1     1     1     0     0     0     0
2 Michael      1     0     0     0     0     1     0     0     0
3 Florence     3     0     0     0     0     0     1     1     1