Hello lovely people of SO!
Guys I have the following raw dataset
ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")
ID TRIAL | TYPE_FAIL |
---|---|
1 | A |
1 | B |
1 | C |
2 | F |
3 | A |
4 | A |
5 | A |
5 | B |
5 | K |
6 | T |
6 | F |
6 | A |
7 | A |
7 | B |
8 | B |
8 | Q |
8 | P |
8 | I |
I need to transform this dataset in such manner that I am able to create a matrix whose columns are the TYPE OF FAILS in alphabetical order and its rows are a binary representation of all unique TYPE OF FAILS
a TRIAL
had for instance
all the TYPES OF FAILS
are in alphabetical order: A B C F I K P Q T
So for TRAIL 8 the matrix row will look like this
A | B | C | F | I | K | P | Q | T |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
The zeros in all other cells represent that during trial 8 for example FAIL TYPE
A did not occurred and so on
my desired output would look like this:
TRIAL | A | B | C | F | I | K | P | Q | T |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
7 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
Thank you all of you guys for helping me out I will be super attentive to read and response to all of your comments
Some of my thought-process behind my solution:
First I need to group by ID TRIAL then
I need to find a function or a routine that will look for
a letter lets say "B"
and add a number one to my matrix under the column B for the
row of the TRIAL
in case, I can do this using multiple ifelse
lines but
my real dataset is quite large and I dont know if there is a way to perform this faster so thank you so much for helping me out on this
CodePudding user response:
Here's a tidyverse solution using dplyr::count
and tidyr::pivot_wider
.
library(dplyr)
library(tidyr)
df1 <- data.frame(ID_TRIAL = c(1, 1, 1, 2, 3, 4, 5, 5 , 5, 6, 6, 6, 7, 7, 8, 8, 8, 8),
TYPE_FAIL = c("A", "B", "C", "F", "A", "A", "A", "B", "K", "T", "F",
"A", "A", "B", "B", "Q", "P", "I"))
df1 %>%
count(ID_TRIAL, TYPE_FAIL) %>%
pivot_wider(names_from = "TYPE_FAIL",
values_from = "n",
names_sort = TRUE) %>%
replace(is.na(.), 0)
Result:
# A tibble: 8 × 10
ID_TRIAL A B C F I K P Q T
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 0 1 0 0 0 0 1
7 1 1 0 0 0 0 0 0 0
8 0 1 0 0 1 0 1 1 0
CodePudding user response:
Matrix format. Generating a matrix from the table
returned values
ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")
df <- data.frame(ID_TRIAL = ID_TRIAL, TYPE_FAIL = TYPE_FAIL)
mat <- table(df) |> matrix(nrow = 8, dimnames = list(unique(df$ID_TRIAL),
sort(unique(df$TYPE_FAIL))))
A B C F I K P Q T
1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 0 1 0 0 0 0 1
7 1 1 0 0 0 0 0 0 0
8 0 1 0 0 1 0 1 1 0
I thought you meant a literal matrix.
If you meant data.frame you can do. Using the table
function to generate some values we can use to pivot wider
data.frame(table(df)) |>
pivot_wider(id_cols = ID_TRIAL, names_from = TYPE_FAIL, values_from = Freq)
ID_TRIAL A B C F I K P Q T
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 0 0 0 0 0 0
2 2 0 0 0 1 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 0 0 0
5 5 1 1 0 0 0 1 0 0 0
6 6 1 0 0 1 0 0 0 0 1
7 7 1 1 0 0 0 0 0 0 0
8 8 0 1 0 0 1 0 1 1 0