I have two dataframes that look like this (out of like 150,000 rows):
Variable ID
A ENSG00000185352
A ENSG00000136267
A ENSG00000141668
B ENSG00000154975
B ENSG00000169855
B ENSG00000173406
head(df2)
Variable ID
A ENSG00000999999
A ENSG00000136267
A ENSG00000141668
B ENSG00000111588
B ENSG00000000000
B ENSG00000173987
I want to compare the number of different elements in ID
are different to df2 ID
, for the same comparison. For example, the output would be:
Comparison Number of differences
A 1
B 3
How can I do this? Appreciate any help!
CodePudding user response:
# 1. create sample data --------------------------------------------------------
df1 <- c(
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000185352',
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000136267',
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000141668',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000154975',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000169855',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000173406'
)
df1 <- data.frame(Comparison = df1[ c(1, 3, 5, 7, 9, 11) ],
ENSG.ID = df1[ c(2, 4, 6, 8, 10, 12) ])
df2 <- c(
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000999999',
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000136267',
'Excitatory.Neuron.fc_versus_Excitatory.Neuron3.fc', 'ENSG00000141668',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000111588',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000000000',
'Excitatory.Neuron.fc_versus_microglia.fc', 'ENSG00000173987'
)
df2 <- data.frame(Comparison = df2[ c(1, 3, 5, 7, 9, 11) ],
ENSG.ID = df2[ c(2, 4, 6, 8, 10, 12) ])
# 2. perform row comparisons ---------------------------------------------------
comparisons <- c(df1$Comparison, df2$Comparison) |>
unique()
differences <- lapply(comparisons, function(x) {
idx <- which(df1$Comparison == x)
d1 <- df1[ idx, 'ENSG.ID' ]
idx <- which(df2$Comparison == x)
d2 <- df2[ idx, 'ENSG.ID' ]
setdiff(d1, d2) |>
length()
}) |> unlist()
results_df <- data.frame(comparisons, differences) |>
`colnames<-`(c('Comparison', 'Number of differences'))
# 3. inspect results -----------------------------------------------------------
View(results_df)