Home > other >  Inserting quotes depending on the length of characters' strings in a R dataframe
Inserting quotes depending on the length of characters' strings in a R dataframe

Time:04-07

I have a dataframe and I would add double quotes to the values which have a length > 6 (i.e.: the length of one ID)

Name   ID
John   a105BD;f648FE
Alice   t487EF
Bob   l984MQ;x204ER;p674WS
Tom   y549JJ
Clem   h852KF;o195TV
...   ...

I would have this :

Name   ID
John   "a105BD;f648FE"
Alice   t487EF
Bob   "l984MQ;x204ER;p674WS"
Tom   y549JJ
Clem   "h852KF;o195TV"
...   ...

So I've tried

for (nchar(as.character(annot3$GO))>6) {
  annot3$GO <- dQuote(annot3$GO) 
}

But I have the following message :

Error: unexpected '}' in "}"

If you have an explanation or a solution, I'll be grateful.

CodePudding user response:

Not sure why would you need manual selective quoting, but we could output with quotes, or manually add quotes using paste:

#example data
x <- read.table(text = "
                   Name   ID
   John   a105BD;f648FE
  Alice   t487EF
    Bob   l984MQ;x204ER;p674WS
    Tom   y549JJ
   Clem   h852KF;o195TV", header = TRUE)

# output with quotes
write.table(x, "tmp.txt", quote = TRUE, row.names = FALSE)
# tmp.txt
# "Name" "ID"
# "John" "a105BD;f648FE"
# "Alice" "t487EF"
# "Bob" "l984MQ;x204ER;p674WS"
# "Tom" "y549JJ"
# "Clem" "h852KF;o195TV"

# or add quotes manually using paste
x$IDquoted <- ifelse(nchar(x$ID) > 6, paste0('"', x$ID, '"') , x$ID)
#    Name                   ID               IDquoted
# 1  John        a105BD;f648FE        "a105BD;f648FE"
# 2 Alice               t487EF                 t487EF
# 3   Bob l984MQ;x204ER;p674WS "l984MQ;x204ER;p674WS"
# 4   Tom               y549JJ                 y549JJ
# 5  Clem        h852KF;o195TV        "h852KF;o195TV"

CodePudding user response:

Here is a dplyr approach. You can use an ifelse statement to identify records with more than 6 characters (nchar()), then paste quotes to it.

library(dplyr)

df %>% mutate(ID = ifelse(nchar(ID) > 6, paste0('"', ID, '"'), ID))

If you don't want to load external package, you can directly assign the output of the above ifelse to the ID column.

df$ID <- ifelse(nchar(df$ID) > 6, paste0('"', df$ID, '"'), df$ID)

They both have the same output (note that the dplyr method WILL NOT overwrite the original df, but the base R alternative will):

   Name                     ID
1  John        "a105BD;f648FE"
2 Alice                 t487EF
3   Bob "l984MQ;x204ER;p674WS"
4   Tom                 y549JJ
5  Clem        "h852KF;o195TV"
  • Related