Home > Back-end >  Remove a part of a row name in a list of dataframes
Remove a part of a row name in a list of dataframes

Time:12-23

I have two lists of dataframes. One list of dataframes is structured as follows:

data1 

Label                            Pred   n
1 Mito-0001_Series007_blue.tif   Pear  10
2 Mito-0001_Series007_blue.tif Orange 223
3 Mito-0001_Series007_blue.tif  Apple 890
4 Mito-0001_Series007_blue.tif  Peach  34

And repeats with different numbers e.g.

Label                            Pred   n
1 Mito-0002_Series007_blue.tif   Pear  90
2 Mito-0002_Series007_blue.tif Orange  127
3 Mito-0002_Series007_blue.tif  Apple  76
4 Mito-0002_Series007_blue.tif  Peach  344

The second list of dataframes is structured. like this:

data2

Slice                                       Area
Mask of Mask-0001Series007_blue-1.tif.      789.21

etc

Question

I want to

  1. Make the row names match up by:

    a) Remove the "Mito-" from data1

    b) Remove the "Mask of Mask-" from data 2

    c) Remove the "-1" towards the end of data 2

Keeping in mind that this is a list of dataframes.

So far:

I have used the information from the post named "How can I remove certain part of row names in data frame"

How can I remove certain part of row names in data frame

They suggest using

data2$Slice <- sub("Mask of Mask-", "", data2$Slice)

Which obviously isn't working for the list of dataframes. It returns a blank character

character(0)

Thanks in advance, I have been amazed at how great people are at answering questions on this site :)

CodePudding user response:

First, we could define a function f that applies gsub with a regex that fits for all.

f <- \(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)

Explanation:

  • .* any single character, repeatedly
  • \\d{4} four digits
  • _? underscore, if available
  • Series literally
  • (...) capture group (they get numbered internally)
  • \\. a period (needs to be escaped, otherwise we say "any character")
  • \\1 capture group 1

Test the regex

## test it
(x <- c(names(data1), data1[[1]]$Label, data2$Slice))
# [1] "Mito-0001_Series007_blue"               "Mito-0002_Series007_blue"              
# [3] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [5] "Mito-0001_Series007_blue.tif"           "Mito-0001_Series007_blue.tif"          
# [7] "Mask of Mask-0001Series007_blue-1.tif."

f(x)
# [1] "0001_Series007_blue" "0002_Series007_blue" "0001_Series007_blue" "0001_Series007_blue"
# [5] "0001_Series007_blue" "0001_Series007_blue" "0001Series007_blue" 

Seems to work, so we can apply it.

names(data1) <- f(names(data1))
data1 <- lapply(data1, \(x) {x$Label <- f(x$Label); x})
data2$Slice <- f(data2$Slice)

data1
# $`0001_Series007_blue`
# Label   Pred   n
# 1 0001_Series007_blue   Pear  10
# 2 0001_Series007_blue Orange 223
# 3 0001_Series007_blue  Apple 890
# 4 0001_Series007_blue  Peach  34
# 
# $`0002_Series007_blue`
# Label   Pred   n
# 1 0002_Series007_blue   Pear  90
# 2 0002_Series007_blue Orange 127
# 3 0002_Series007_blue  Apple  76
# 4 0002_Series007_blue  Peach 344

data2
#                Slice   Area
# 1 0001Series007_blue 789.21

Data:

data1 <- list(`Mito-0001_Series007_blue` = structure(list(Label = c("Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif", "Mito-0001_Series007_blue.tif", 
"Mito-0001_Series007_blue.tif"), Pred = c("Pear", "Orange", "Apple", 
"Peach"), n = c(10L, 223L, 890L, 34L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")), `Mito-0002_Series007_blue` = structure(list(
    Label = c("Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif", 
    "Mito-0002_Series007_blue.tif", "Mito-0002_Series007_blue.tif"
    ), Pred = c("Pear", "Orange", "Apple", "Peach"), n = c(90L, 
    127L, 76L, 344L)), class = "data.frame", row.names = c("1", 
"2", "3", "4")))

data2 <- structure(list(Slice = "Mask of Mask-0001Series007_blue-1.tif.", 
    Area = 789.21), class = "data.frame", row.names = c(NA, -1L
))

CodePudding user response:

Using the given info

The answer by @jay.sf, was really helpful. But it only worked for data1, rather than data2. To ensure it also got applied to data2, I added the extra line of code:

#Old code
f <-function(x) gsub('.*(\\d{4}_?Series\\d{3}_blue).*(\\.tif)?\\.?', '\\1\\2', x)

#I added the [[1]] after data2 as well
(x <- c(names(data1), data1[[1]]$Label, data2[[1]]$Slice))
f(x)


names(data1) <- f(names(data1))
data1 <- lapply(data1, function(x) {x$Label <- f(x$Label); x})

# This line of code was causing problems, so I removed it
# data2$Slice <- f(data2$Slice)

#And added the following to apply it to data 2

names(data2) <- f(names(data2))
data2 <- lapply(data2, function(x) {x$Slice <- f(x$Slice); x})


  • Related