I would like to create a table that assigns a column to each value of one column.
The data looks like this:
Person Task
John 4
Michael 1
Florence 3
Expected result:
Person Task 1 2 3 4 5 6 7 8
John 4 1 1 1 1 0 0 0 0
Michael 1 0 0 0 0 1 0 0 0
Florence 3 0 0 0 0 0 1 1 1
It's important that the column values are filled in an orderly manner. The first row, then the second and so on.
Thanks!
CodePudding user response:
in base R:
cbind(df, t(unname(model.matrix(~with(df, factor(rep(Person, Task), Person))-1))))
Person Task 1 2 3 4 5 6 7 8
1 John 4 1 1 1 1 0 0 0 0
2 Michael 1 0 0 0 0 1 0 0 0
3 Florence 3 0 0 0 0 0 1 1 1
For easier code:
Create a dataframe that lookes like below:
df1 <- with(df, data.frame(lengths = Task, values = factor(Person, Person)))
df1
lengths values
1 4 John
2 1 Michael
3 3 Florence
Note that values is now a factor column with levels same as the values.
Then you could simply do:
cbind(df, t(unname(model.matrix(~inverse.rle(df1)-1))))
Person Task 1 2 3 4 5 6 7 8
1 John 4 1 1 1 1 0 0 0 0
2 Michael 1 0 0 0 0 1 0 0 0
3 Florence 3 0 0 0 0 0 1 1 1
CodePudding user response:
Setting up your dataframe:
> df <- data.frame(Name=factor(c("John", "Michael", "Florence"), levels=c("John", "Michael", "Florence")), Task=c(4,1,3))
> df
Name Task
1 John 4
2 Michael 1
3 Florence 3
First I will make a 'long' dataframe, expanding each name and task by the number of entries needed. The id will make sure that when I reshape the dataframe wide the columns have the correct names:
df2 <- data.frame(Name=rep(df$Name, df$Task),
Task=rep(df$Task, df$Task),
id = 1:sum(df$Task))
> df2
Name Task id
1 John 4 1
2 John 4 2
3 John 4 3
4 John 4 4
5 Michael 1 5
6 Florence 3 6
7 Florence 3 7
8 Florence 3 8
Now I can reshape wide using the dcast
function from reshape2
reshape2::dcast(df2, Name Task ~ id, fun.aggregate = length, value.var="id")
Name Task 1 2 3 4 5 6 7 8
1 John 4 1 1 1 1 0 0 0 0
2 Michael 1 0 0 0 0 1 0 0 0
3 Florence 3 0 0 0 0 0 1 1 1
CodePudding user response:
A solution with diag()
, since the added values have the shape of a diagonal matrix, just with repeated columns:
n <- length(dat$Task)
cbind( dat, matrix( unlist( apply( rbind(as.integer(dat$Task), 1:n), 2,
function(x) rep(diag(n)[,x[2]], x[1]) ) ), n ) )
Person Task 1 2 3 4 5 6 7 8
1 John 4 1 1 1 1 0 0 0 0
2 Michael 1 0 0 0 0 1 0 0 0
3 Florence 3 0 0 0 0 0 1 1 1
Data:
dat <- structure(list(Person = c("John", "Michael", "Florence"), Task = c(4L,
1L, 3L)), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
You could use
library(dplyr)
library(tidyr)
df %>%
uncount(Task, .remove = FALSE) %>%
mutate(rn = row_number(),
value = 1) %>%
pivot_wider(c(Person, Task),
names_from = rn,
values_from = value,
values_fill = 0)
This returns
# A tibble: 3 x 10
Person Task `1` `2` `3` `4` `5` `6` `7` `8`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 John 4 1 1 1 1 0 0 0 0
2 Michael 1 0 0 0 0 1 0 0 0
3 Florence 3 0 0 0 0 0 1 1 1