Modify names in df column-CodePudding

I want to make a huge table of data and there are data coming from different places, but some of the names are the same and it's not possible to decide where it came from.

I have a solution in my head, but I don't know if its possible to achieve.

Here is a part of my data:

name            id      sym
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL

As you can see, I cannot decide where it came from. My idea is to modify the names of the name in the separated dataframes before merging them and getting a merged df like this:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000135821_sample2 2752    GLUL
ENSG00000135821_sample3 2752    GLUL
ENSG00000135821_sample4 2752    GLUL

Is it possible to add modification to all the names in a df column with keeping the original name?

For a separate df I would like to get:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000182667_sample1 50863   NTM
ENSG00000155495_sample1 9947    MAGEC1
ENSG00000198959_sample1 7052    TGM2

Thank you!

CodePudding user response：

A dplyr solution. Group by id and sym and use seq_along to get the consecutive numbers.

df1 <- 'name            id      sym
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL'
df1 <- read.table(textConnection(df1), header = TRUE)

df2 <-"name                    id      sym
ENSG00000135821 2752    GLUL
ENSG00000182667 50863   NTM
ENSG00000155495 9947    MAGEC1
ENSG00000198959 7052    TGM2"
df2 <- read.table(textConnection(df2), header = TRUE)

suppressPackageStartupMessages(
  library(dplyr)
)

df1 %>%
  group_by(id, sym) %>%
  mutate(name = paste0(name, "_sample", seq_along(name))) %>%
  ungroup()
#> # A tibble: 4 × 3
#>   name                       id sym  
#>   <chr>                   <int> <chr>
#> 1 ENSG00000135821_sample1  2752 GLUL 
#> 2 ENSG00000135821_sample2  2752 GLUL 
#> 3 ENSG00000135821_sample3  2752 GLUL 
#> 4 ENSG00000135821_sample4  2752 GLUL

^{Created on 2022-10-14 with reprex v2.0.2}

This can be written as function and applied to any data set as long as the columns names are the same, name, id and sym.

newname <- function(x) {
  x %>%
    group_by(id, sym) %>%
    mutate(name = paste0(name, "_sample", seq_along(name))) %>%
    ungroup()
}

newname(df1)
#> # A tibble: 4 × 3
#>   name                       id sym  
#>   <chr>                   <int> <chr>
#> 1 ENSG00000135821_sample1  2752 GLUL 
#> 2 ENSG00000135821_sample2  2752 GLUL 
#> 3 ENSG00000135821_sample3  2752 GLUL 
#> 4 ENSG00000135821_sample4  2752 GLUL

newname(df2)
#> # A tibble: 4 × 3
#>   name                       id sym   
#>   <chr>                   <int> <chr> 
#> 1 ENSG00000135821_sample1  2752 GLUL  
#> 2 ENSG00000182667_sample1 50863 NTM   
#> 3 ENSG00000155495_sample1  9947 MAGEC1
#> 4 ENSG00000198959_sample1  7052 TGM2

^{Created on 2022-10-14 with reprex v2.0.2}

CodePudding user response：

Here is another option. Put all the dataframes in a list, then map out new names in each dataframe, then combine after each dataframe has a new name:

library(tidyverse)

#example data
df3 <- df2 <- df1 <-read_table("name                    id      sym
ENSG00000135821 2752    GLUL
ENSG00000182667 50863   NTM
ENSG00000155495 9947    MAGEC1
ENSG00000198959 7052    TGM2")


list(df1, df2, df3) |>
  (\(l) map2_dfr(l, 1:length(l),\(df, num){
    mutate(df, name = glue::glue("{name}_sample{num}"))
    }))() |>
  arrange(name, id)
#> # A tibble: 12 x 3
#>    name                       id sym   
#>    <glue>                  <dbl> <chr> 
#>  1 ENSG00000135821_sample1  2752 GLUL  
#>  2 ENSG00000135821_sample2  2752 GLUL  
#>  3 ENSG00000135821_sample3  2752 GLUL  
#>  4 ENSG00000155495_sample1  9947 MAGEC1
#>  5 ENSG00000155495_sample2  9947 MAGEC1
#>  6 ENSG00000155495_sample3  9947 MAGEC1
#>  7 ENSG00000182667_sample1 50863 NTM   
#>  8 ENSG00000182667_sample2 50863 NTM   
#>  9 ENSG00000182667_sample3 50863 NTM   
#> 10 ENSG00000198959_sample1  7052 TGM2  
#> 11 ENSG00000198959_sample2  7052 TGM2  
#> 12 ENSG00000198959_sample3  7052 TGM2