Home > Software engineering >  How to add quotes to every 2nd word in a string in R
How to add quotes to every 2nd word in a string in R

Time:02-24

I want to add double quotes around every second word in this single string.

From this

gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; 
gene_type protein_coding; gene_name CD45A;

to this

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; 
gene_type "protein_coding"; gene_name "CD45A";

I have been looking through tidyverse and stringr but have not yet found good way to do this.

Thanks!

CodePudding user response:

Here's a way to split the string apart, add the quotes to every other item, and paste it back together.

x = "gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; gene_type protein_coding; gene_name CD45A;"
x = unlist(strsplit(x, " "))
evens = seq(2, length(x), by = 2)
x[evens] = paste0('"', x[evens])
x[evens] = sub(';', '";', x[evens], fixed = TRUE)
x = paste(x, collapse = " ")
cat(x)
# gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";

CodePudding user response:

Here is a base R approach.

First remove the ; at the end of the string, then split the vector of gene information by ;, then split again by empty space " " and save to a new vector vec_apply.

After that, paste back the unmodified split strings together with the modified strings (the strings that have new double quotes).

Note that in the console, double quotes will be preceded with backslash \ to "escape" the double quote. But after you have saved the vector to a text file, the backslash will be gone.

vec <- c("gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; gene_type protein_coding; gene_name CD45A;")

vec <- gsub(";$", "", vec)

vec_apply <- str_split_fixed(vec, "; ", n = str_count(string = vec, pattern = ";")   1) %>% 
  strsplit(split = " ")

paste(sapply(vec_apply, `[[`, 1), 
      sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " ")

Output in console

"gene_id \"ENSG00000081237\"; gene_version \"20\"; transcript_id \"ENST00000442510\"; transcript_version \"8\"; gene_type \"protein_coding\"; gene_name \"CD45A\";"

Output in text file

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";

Or as suggested by @GregorThomas in another answer, use cat() to view the output to check if double quotes are added successfully.

cat(paste(sapply(vec_apply, `[[`, 1), 
          sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " "))

gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";
  • Related