Home > database >  How do I make a more concise extraction from a character vector in R?
How do I make a more concise extraction from a character vector in R?

Time:04-22

We use google calendar for reservations of several machines, and I am making graphs of the usage. I have a line to extract the titles of the calendars for plotting, but it seems rather long and clunky, using grep twice (once for the index of the calendar name, and once to extract the name of the calendar). I don't want to assume that the name of the calendar will always be at the same index. The calendar data was downloaded as a .ics file and imported using read_lines() from tidyverse. Is there a more concise way to get the calendar name?

> calendar_raw[1:20]
 [1] "BEGIN:VCALENDAR"                                         
 [2] "PRODID:-//Google Inc//Google Calendar 70.9054//EN"       
 [3] "VERSION:2.0"                                             
 [4] "CALSCALE:GREGORIAN"                                      
 [5] "METHOD:PUBLISH"                                          
 [6] "X-WR-CALNAME:Calendar Name"                  
 [7] "X-WR-TIMEZONE:America/Los_Angeles"                       
 [8] "X-WR-CALDESC:Schedule for the machine"
 [9] "BEGIN:VEVENT"                                            
[10] "DTSTART:20180223T210000Z"                                
[11] "DTEND:20180223T220000Z"                                  
[12] "DTSTAMP:20220421T162943Z"                                
[13] "UID:[email protected]"               
[14] "CREATED:20180222T195641Z"                                
[15] "DESCRIPTION:"                                            
[16] "LAST-MODIFIED:20180222T200100Z"                          
[17] "LOCATION:"                                               
[18] "SEQUENCE:0"                                              
[19] "STATUS:CONFIRMED"                                        
[20] "SUMMARY:Username"   

> gsub("X-WR-CALNAME:(.*$)","\\1", calendar_raw[grep("X-WR-CALNAME:",calendar_raw)])
[1] "Calendar Name"          

CodePudding user response:

You still need to grep the character vector containing X-WR-CALNAME: and then remove it, so what you have is fine.

What you can do is

  • Use sub since you only use a single search and replace operation
  • You do not need a $ in your regex, and you really have no need to consume the rest of the string after X-WR-CALNAME: to restore later with \1 backreference. If it must stay, just do not consume it, it will not be touched with the sub operation.

You can thus use

sub("^X-WR-CALNAME:", "", calendar_raw[grep("X-WR-CALNAME:", calendar_raw)])
  • Related