Home > other >  (R) Table where each column represents whether value exists in a particular vector
(R) Table where each column represents whether value exists in a particular vector

Time:12-16

The title might be a bit confusing but allow me to explain.

Say we have vectors marked as such:

v1<-c("a","x","y","z")
v2 <-c("b","g","m","r","s","x","z")
v3 <-c("a","m","x","y","z","b","r","g")
v4 <-c("d","h","a","g","s","x")

I want to create a data table in R such that it represents whether each possible value in the set of vectors is present in the respective vector. Preferably, I would like this done without having to iterate through the list of vectors more than once.

Desired output:

ID v1 v2 v3 v4
a  1  0  1  1
x  1  1  1  1
y  1  0  1  0
z  1  1  1  0
b  0  1  1  0
g  0  1  1  1
m  0  1  1  0
r  0  1  1  0
s  0  1  0  1
d  0  0  0  1
h  0  0  0  1

Sorry if the explanation is a bit weird, I don't really now how to explain it in words. But hopefully the example code and desired output explains it clearly enough.

Thank you!


What I already tried:

I got a basic idea on how this could be accomplished through two iterations of the list of vectors, which could may be done by going through the list of vectors one time first to get the total list of unique IDs. Then the second iteration would go through each value of each vector and manually add the 1 or 0 to the data table individually depending on if that row's ID exists in the vector.

But this process would take an absurd amount of time if the number of vectors I'd have to parse is large, the vectors are more diverse, or the vectors are simply larger. Even considering the two loop process, since I'd be inspecting each value in the list of possible IDs individually.

There must be some easier way to do this.

CodePudding user response:

v1 <- c("a", "x", "y", "z")
v2 <- c("b", "g", "m", "r", "s", "x", "z")
v3 <- c("a", "m", "x", "y", "z", "b", "r", "g")
v4 <- c("d", "h", "a", "g", "s", "x")


nms <- paste0("v", 1:4)
list_to_do <- lapply(nms,
  FUN = get
)

(unqs <- unique(unlist(list_to_do)))

mymat <- sapply(list_to_do, function(x) as.integer(unqs %in% x))

row.names(mymat) <- unqs
colnames(mymat) <- nms

mymat
  • Related