Home > Software engineering >  Compare dataframes and vectors in R
Compare dataframes and vectors in R

Time:09-28

I have a huge data frame in R and I want to compare the first four columns with a vector. This is my data frame:

> head(MyTable)
  config.1.         config.2.         config.3.       config.4.                              kernel orden.1. orden.2. orden.3. orden.4.
1    gen(1) 2*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1)     gen(4)-2*gen(3) 2*gen(2)-gen(1)        1        2        3        4
2    gen(1) 4*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1)     gen(4) 2*gen(3)-2*gen(2)-gen(1)        1        3        2        4
3    gen(1) 6*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1) 3*gen(4) 2*gen(3)-2*gen(2)-3*gen(1)        3        1        4        2
4    gen(1) 7*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1)     2*gen(4) gen(3)-gen(2)-2*gen(1)        3        1        4        2
5    gen(1) 8*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1) 5*gen(4) 2*gen(3)-2*gen(2)-5*gen(1)        3        1        4        2
6    gen(1) 9*gen(2) 2*gen(1) 3*gen(2) 2*gen(1) 2*gen(2) gen(1)     3*gen(4) gen(3)-gen(2)-3*gen(1)        3        1        4        2

Where the first columns are strings. I want to compare the first four columns in every row with the vector:

> MyConfig
[1] "gen(1)"            "gen(2)"            "3*gen(2) 2*gen(1)" "2*gen(2) gen(1)"  

Every row of the data frame has three columns equal to the columns in the vector and one different. I need a new data frame with TRUE or FALSE values.

I tried to compare the submatrix MyTable[,1:4] with the vector MyConfig using == and I get:

> head(MyTable[,1:4]==MyConfig)
     config.1. config.2. config.3. config.4.
[1,]      TRUE     FALSE     FALSE     FALSE
[2,]     FALSE     FALSE     FALSE     FALSE
[3,]     FALSE     FALSE      TRUE     FALSE
[4,]     FALSE     FALSE     FALSE      TRUE
[5,]      TRUE     FALSE     FALSE     FALSE
[6,]     FALSE     FALSE     FALSE     FALSE

But if I compare the cells manually this is the result:

> MyTable[1,3]==MyConfig[3]
[1] TRUE

So, cell by cell I get the result I´m expecting. But if I compare the complete table, the result is different.

The output I´m expecting is:

> head(MyTable[,1:4]==MyConfig)
     config.1. config.2. config.3. config.4.
[1,]      TRUE     FALSE      TRUE      TRUE
[2,]      TRUE     FALSE      TRUE      TRUE
[3,]      TRUE     FALSE      TRUE      TRUE
[4,]      TRUE     FALSE      TRUE      TRUE
[5,]      TRUE     FALSE      TRUE      TRUE
[6,]      TRUE     FALSE      TRUE      TRUE

CodePudding user response:

We may need either replicate the 'MyConfig' values

MyTable[,1:4]==MyConfig[col(MyTable[, 1:4])]

Or transpose the dataset, do the comparison and transpose again

t(t(MyTable[1:4] == MyConfig)
  • Related