I have two lists of dataframes. One list of dataframes is structured as follows:
data1
Label Pred n
1 Mito-0001_Series007_blue.tif Pear 10
2 Mito-0001_Series007_blue.tif Orange 223
3 Mito-0001_Series007_blue.tif Apple 890
4 Mito-0001_Series007_blue.tif Peach 34
And repeats with different numbers e.g.
Label Pred n
1 Mito-0002_Series007_blue.tif Pear 90
2 Mito-0002_Series007_blue.tif Orange 127
3 Mito-0002_Series007_blue.tif Apple 76
4 Mito-0002_Series007_blue.tif Peach 344
The second list of dataframes is structured. like this:
data2
Slice Area
Mask of Mask-0001Series007_blue-1.tif. 789.21
etc
Question
I want to
Make the row names match up by:
a) Remove the "Mito-" from data1
b) Remove the "Mask of Mask-" from data 2
c) Remove the "-1" towards the end of data 2
Keeping in mind that this is a list of dataframes.
So far:
I have used the information from the post named "How can I remove certain part of row names in data frame"
How can I remove certain part of row names in data frame
They suggest using
data2$Slice <- sub("Mask of Mask-", "", data2$Slice)
Which obviously isn't working for the list of dataframes. It returns a blank character
character(0)
Thanks in advance, I have been amazed at how great people are at answering questions on this site :)
CodePudding user response:
First, we could define a function f
that applies gsub
with a regex that fits for all.
f <- \(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)
Explanation:
.*
any single character, repeatedly\\d{4}
four digits_?
underscore, if availableSeries
literally(...)
capture group (they get numbered internally)\\.
a period (needs to be escaped, otherwise we say "any character")\\1
capture group 1
## test it
(x <- c(names(data1), data1[[1]]$Label, data2$Slice))
# [1] "Mito-0001_Series007_blue" "Mito-0002_Series007_blue"
# [3] "Mito-0001_Series007_blue.tif" "Mito-0001_Series007_blue.tif"
# [5] "Mito-0001_Series007_blue.tif" "Mito-0001_Series007_blue.tif"
# [7] "Mask of Mask-0001Series007_blue-1.tif."
f(x)
# [1] "0001_Series007_blue" "0002_Series007_blue" "0001_Series007_blue" "0001_Series007_blue"
# [5] "0001_Series007_blue" "0001_Series007_blue" "0001Series007_blue"
Seems to work, so we can apply it.
names(data1) <- f(names(data1))
data1 <- lapply(data1, \(x) {x$Label <- f(x$Label); x})
data2$Slice <- f(data2$Slice)
data1
# $`0001_Series007_blue`
# Label Pred n
# 1 0001_Series007_blue Pear 10
# 2 0001_Series007_blue Orange 223
# 3 0001_Series007_blue Apple 890
# 4 0001_Series007_blue Peach 34
#
# $`0002_Series007_blue`
# Label Pred n
# 1 0002_Series007_blue Pear 90
# 2 0002_Series007_blue Orange 127
# 3 0002_Series007_blue Apple 76
# 4 0002_Series007_blue Peach 344
data2
# Slice Area
# 1 0001Series007_blue 789.21
Data:
data1 <- list(`Mito-0001_Series007_blue` = structure(list(Label = c("Mito-0001_Series007_blue.tif",
"Mito-0001_Series007_blue.tif", "Mito-0001_Series007_blue.tif",
"Mito-0001_Series007_blue.tif"), Pred = c("Pear", "Orange", "Apple",
"Peach"), n = c(10L, 223L, 890L, 34L)), class = "data.frame", row.names = c("1",
"2", "3", "4")), `Mito-0002_Series007_blue` = structure(list(
Label = c("Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif",
"Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif"
), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(90L,
127L, 76L, 344L)), class = "data.frame", row.names = c("1",
"2", "3", "4")))
data2 <- structure(list(Slice = "Mask of Mask-0001Series007_blue-1.tif.",
Area = 789.21), class = "data.frame", row.names = c(NA, -1L
))
CodePudding user response:
Using the given info
The answer by @jay.sf, was really helpful. But it only worked for data1, rather than data2. To ensure it also got applied to data2, I added the extra line of code:
#Old code
f <-function(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)
#I added the [[1]] after data2 as well
(x <- c(names(data1), data1[[1]]$Label, data2[[1]]$Slice))
f(x)
names(data1) <- f(names(data1))
data1 <- lapply(data1, function(x) {x$Label <- f(x$Label); x})
# This line of code was causing problems, so I removed it
# data2$Slice <- f(data2$Slice)
#And added the following to apply it to data 2
names(data2) <- f(names(data2))
data2 <- lapply(data2, function(x) {x$Slice <- f(x$Slice); x})