I have a dataset with different columns and I would like to compare a column with its replicate. I have two replicates for each sample. The name of replica 1 is of this type: name_sample_1
and of replica 2: name_sample_1_2
.
I would like to compare replicate 1 and replicate 2 for each sample: if a value is present in one replicate and in the other it is 0, I would like to replace the two values by NA
.
Original
replicate_1 | replicate_1_2 |
---|---|
0 | 0 |
750 | 0 |
0 | 850 |
650 | 950 |
Wanted
replicate_1 | replicate_1_2 |
---|---|
0 | 0 |
NAN | NAN |
NAN | NAN |
650 | 950 |
here is a screenshot of my header
CodePudding user response:
You can do this vectorized base R approach with indexing, which isolates all the rows with the given conditions and replaces all values across columns with NA
:
df[(df$replicate_1 == 0 | df$replicate_1_2 == 0) &
!(df$replicate_1 == 0 & df$replicate_1_2 == 0), ] <- NA
Output:
# replicate_1 replicate_1_2
# 1 0 0
# 2 NA NA
# 3 NA NA
# 4 650 950
# Data
df <- read.table(text = "replicate_1 replicate_1_2
0 0
750 0
0 850
650 950", header = TRUE)
Note that this replaces values across all columns with NA
- if you only want to replace values with NA
in certain columns, you can specify them:
Example data building off of what you provided, adding an extra column to ignore (keep values):
df2 <- read.table(text = "replicate_1 replicate_1_2 ignore_column
0 0 A
750 0 B
0 850 C
650 950 D", header = TRUE)
df2[(df2$replicate_1 == 0 | df2$replicate_1_2 == 0) &
!(df2$replicate_1 == 0 & df2$replicate_1_2 == 0),
c("replicate_1", "replicate_1_2")] <- NA
Output:
# replicate_1 replicate_1_2 ignore_column
# 1 0 0 A
# 2 NA NA B
# 3 NA NA C
# 4 650 950 D
CodePudding user response:
Here's a solution that extends to n pairs of columns with the same prefix.
First, use reproducible data. There are two pairs of columns with the same prefix:
dat <- structure(list(some_name_replicate_1 = c(0L, 750L, 0L, 650L),
some_name_replicate_1_2 = c(0L, 0L, 850L, 950L), some_othername_replicate_1 = c(0L,
750L, 0L, 0L), some_othername_replicate_1_2 = c(0L, 0L, 0L,
950L)), class = "data.frame", row.names = c(NA, -4L))
# some_name_replicate_1 some_name_replicate_1_2 some_othername_replicate_1 some_othername_replicate_1_2
# 1 0 0 0 0
# 2 750 0 750 0
# 3 0 850 0 0
# 4 650 950 0 950
The code consists of:
- Split the columns according to their prefix and create a list
- Replace the necessary values to NAs
- Reduce the list to the original dataframe format
dat |>
split.default(gsub("_replicate_1.*", "", colnames(dat))) |>
lapply(function(x) {
x[x[1] * x[2] == 0 & x[1] x[2] != 0, ] <- NA
x
}) |>
Reduce(f = cbind)
# some_name_replicate_1 some_name_replicate_1_2 some_othername_replicate_1 some_othername_replicate_1_2
# 1 0 0 0 0
# 2 NA NA NA NA
# 3 NA NA 0 0
# 4 650 950 NA NA