Converting unique values from data frame intro a reference Matrix-CodePudding

Hello lovely people of SO!

Guys I have the following raw dataset

ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")

ID TRIAL	TYPE_FAIL
1	A
1	B
1	C
2	F
3	A
4	A
5	A
5	B
5	K
6	T
6	F
6	A
7	A
7	B
8	B
8	Q
8	P
8	I

I need to transform this dataset in such manner that I am able to create a matrix whose columns are the TYPE OF FAILS in alphabetical order and its rows are a binary representation of all unique TYPE OF FAILS a TRIAL had for instance

all the TYPES OF FAILS are in alphabetical order: A B C F I K P Q T

So for TRAIL 8 the matrix row will look like this

A	B	C	F	I	K	P	Q	T
0	1	0	0	1	0	1	1	0

The zeros in all other cells represent that during trial 8 for example FAIL TYPE A did not occurred and so on

my desired output would look like this:

TRIAL	A	B	C	F	I	K	P	Q	T
1	1	1	1	0	0	0	0	0	0
2	0	0	0	1	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0
5	1	1	0	0	0	1	0	0	0
6	1	0	0	1	0	0	0	0	1
7	1	1	0	0	0	0	0	0	0
8	0	1	0	0	1	0	1	1	0

Thank you all of you guys for helping me out I will be super attentive to read and response to all of your comments

Some of my thought-process behind my solution: First I need to group by ID TRIAL then I need to find a function or a routine that will look for a letter lets say "B" and add a number one to my matrix under the column B for the row of the TRIAL in case, I can do this using multiple ifelse lines but my real dataset is quite large and I dont know if there is a way to perform this faster so thank you so much for helping me out on this

CodePudding user response：

Here's a tidyverse solution using dplyr::count and tidyr::pivot_wider.

library(dplyr)
library(tidyr)

df1 <- data.frame(ID_TRIAL = c(1, 1, 1, 2, 3, 4, 5, 5 , 5, 6, 6, 6, 7, 7, 8, 8, 8, 8),  
                  TYPE_FAIL = c("A", "B", "C", "F", "A", "A", "A", "B", "K", "T", "F", 
                                "A", "A", "B", "B", "Q", "P", "I"))

df1 %>% 
  count(ID_TRIAL, TYPE_FAIL) %>% 
  pivot_wider(names_from = "TYPE_FAIL", 
              values_from = "n", 
              names_sort = TRUE) %>% 
  replace(is.na(.), 0)

Result:

# A tibble: 8 × 10
  ID_TRIAL     A     B     C     F     I     K     P     Q     T
     <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
         1     1     1     1     0     0     0     0     0     0
         2     0     0     0     1     0     0     0     0     0
         3     1     0     0     0     0     0     0     0     0
         4     1     0     0     0     0     0     0     0     0
         5     1     1     0     0     0     1     0     0     0
         6     1     0     0     1     0     0     0     0     1
         7     1     1     0     0     0     0     0     0     0
         8     0     1     0     0     1     0     1     1     0

CodePudding user response：

Matrix format. Generating a matrix from the table returned values


ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)  
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")

df <- data.frame(ID_TRIAL = ID_TRIAL, TYPE_FAIL = TYPE_FAIL)

mat <- table(df) |> matrix(nrow = 8, dimnames = list(unique(df$ID_TRIAL),
sort(unique(df$TYPE_FAIL))))

  A B C F I K P Q T
1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 0 1 0 0 0 0 1
7 1 1 0 0 0 0 0 0 0
8 0 1 0 0 1 0 1 1 0

I thought you meant a literal matrix.

If you meant data.frame you can do. Using the table function to generate some values we can use to pivot wider

data.frame(table(df)) |> 
  pivot_wider(id_cols = ID_TRIAL, names_from = TYPE_FAIL, values_from = Freq)

  ID_TRIAL     A     B     C     F     I     K     P     Q     T
  <fct>    <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1            1     1     1     0     0     0     0     0     0
2 2            0     0     0     1     0     0     0     0     0
3 3            1     0     0     0     0     0     0     0     0
4 4            1     0     0     0     0     0     0     0     0
5 5            1     1     0     0     0     1     0     0     0
6 6            1     0     0     1     0     0     0     0     1
7 7            1     1     0     0     0     0     0     0     0
8 8            0     1     0     0     1     0     1     1     0

TRIAL	A	B	C	F	I	K	P	Q	T
1	1	1	1	0	0	0	0	0	0
2	0	0	0	1	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0
5	1	1	0	0	0	1	0	0	0
6	1	0	0	1	0	0	0	0	1
7	1	1	0	0	0	0	0	0	0
8	0	1	0	0	1	0	1	1	0

TRIAL	A	B	C	F	I	K	P	Q	T
1	1	1	1	0	0	0	0	0	0
2	0	0	0	1	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0
5	1	1	0	0	0	1	0	0	0
6	1	0	0	1	0	0	0	0	1
7	1	1	0	0	0	0	0	0	0
8	0	1	0	0	1	0	1	1	0

TRIAL	A	B	C	F	I	K	P	Q	T
1	1	1	1	0	0	0	0	0	0
2	0	0	0	1	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0
5	1	1	0	0	0	1	0	0	0
6	1	0	0	1	0	0	0	0	1
7	1	1	0	0	0	0	0	0	0
8	0	1	0	0	1	0	1	1	0