Home > Software engineering >  Compare rows across factors in two different dataframes
Compare rows across factors in two different dataframes

Time:10-13

I have two dataframes that look like this (out of like 150,000 rows):

Variable ID
A        ENSG00000185352
A        ENSG00000136267
A        ENSG00000141668
B        ENSG00000154975
B        ENSG00000169855
B        ENSG00000173406
head(df2)
Variable ID
A          ENSG00000999999
A          ENSG00000136267
A          ENSG00000141668
B          ENSG00000111588
B          ENSG00000000000
B          ENSG00000173987

I want to compare the number of different elements in ID are different to df2 ID, for the same comparison. For example, the output would be:

Comparison  Number of differences
A           1
B           3

How can I do this? Appreciate any help!

CodePudding user response:

# 1. create sample data --------------------------------------------------------

df1 <- c(
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000185352',
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000136267',
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000141668',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000154975',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000169855',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000173406'
)
df1 <- data.frame(Comparison = df1[ c(1, 3, 5, 7, 9, 11) ], 
                  ENSG.ID = df1[ c(2, 4, 6, 8, 10, 12) ])

df2 <- c(
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000999999',
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000136267',
  'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000141668',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000111588',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000000000',
  'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000173987'
)
df2 <- data.frame(Comparison = df2[ c(1, 3, 5, 7, 9, 11) ], 
                  ENSG.ID = df2[ c(2, 4, 6, 8, 10, 12) ])

# 2. perform row comparisons ---------------------------------------------------

comparisons <- c(df1$Comparison, df2$Comparison) |> 
  unique()
differences <- lapply(comparisons, function(x) {
  idx <- which(df1$Comparison == x)
  d1 <- df1[ idx, 'ENSG.ID' ]
  idx <- which(df2$Comparison == x)
  d2 <- df2[ idx, 'ENSG.ID' ]
  setdiff(d1, d2) |> 
    length()
}) |> unlist()
results_df <- data.frame(comparisons, differences) |> 
  `colnames<-`(c('Comparison', 'Number of differences'))

# 3. inspect results -----------------------------------------------------------

View(results_df)

  •  Tags:  
  • r
  • Related