I have a gene expression file and its row names is like this: GTEX.1117F.3226.SM.5N9CT enter image description here I want to edit its rownames to be like this:
GTEX-1117F and so on.
I used these commands:
row.names(gene_exp_transpose) <- data
gsub(".","-",row.names(gene_exp_transpose)) #this just gives ----- to all the rownames data
row.names(gene_exp) substr(data, 0,5) ## but for the last rows, it has 4 character instead of 5.
CodePudding user response:
We could do it this way:
row names to columns with
rownames_to_colum
fromtibble
packageusing regular expression:
'sub('^([^.] .[^.] ).*', '\\1'
removes everything after second dot.replace
.
by-
And back to rownmaes
library(tibble)
library(dplyr)
df %>%
rownames_to_column("X") %>%
mutate(X = sub('^([^.] .[^.] ).*', '\\1', X),
X = sub('\\.', '-', X)) %>%
column_to_rownames("X")
output:
ENSG00000223972.5 ENSG00000227232.5 ENSG00000278267.1 ENSG00000243485.5
GTEX-1117F 1.0705061 319.01082 0.0000000 0.0000000
GTEX-111FC 0.0000000 137.62750 0.8192113 1.6384227
GTEX-1128S 0.9312597 98.71353 0.0000000 0.9312597
GTEX-117XS 0.0000000 140.96666 0.0000000 0.7661232
GTEX-1192X 0.9374262 139.67650 0.0000000 0.9374262
data:
structure(list(ENSG00000223972.5 = c(1.0705061, 0, 0.9312597,
0, 0.9374262), ENSG00000227232.5 = c(319.01082, 137.6275, 98.71353,
140.96666, 139.6765), ENSG00000278267.1 = c(0, 0.8192113, 0,
0, 0), ENSG00000243485.5 = c(0, 1.6384227, 0.9312597, 0.7661232,
0.9374262)), class = "data.frame", row.names = c("GTEX.1117F.3226.SM.5N9CT",
"GTEX.111FC.3126.SM.5GZZ2", "GTEX.1128S.2726.SM.5H12C", "GTEX.117XS.3026.SM.5N9CA",
"GTEX.1192X.3126.SM.5N9BY"))
CodePudding user response:
A base R solution. Data borrowed from TarJae's answer.
In the first instruction, the regex is almost identical to TarJae's, with two differences:
- The first period to be matched is escaped;
- the end of string is made explicit.
Then the only period is replaced by a dash "_"
.
row.names(df) <- sub('^([^.] \\.[^.] ).*$', '\\1', row.names(df))
row.names(df) <- sub('\\.', '-', row.names(df))
row.names(df)
#> [1] "GTEX-1117F" "GTEX-111FC" "GTEX-1128S" "GTEX-117XS" "GTEX-1192X"
Created on 2022-07-02 by the reprex package (v2.0.1)