I want to extract titles (Mr, Mrs, Miss) from within the Name
column and import those extracted titles into a new column Title
. Relevant data looks like this:
snippet <- data_frame(Name=c('Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley','Heikkinen, Miss. Laina'),Column=c('blah','blah,'blah'))
I've reviewed this answer, but I must be missing something.
Here's the best code I could come up with: snippet <- mutate(snippet, Title = str_extract(snippet $Name, "(?<=,)[^,]*(?=.)")
. This does add the Title
column, but all values within that column are NA. Where's my error? Thanks.
CodePudding user response:
Maybe this helps - in the column 'Name', there is a space after the ,
, so we use regex lookaround to match non-whitespace characters (\\S
) that succeeds after the ,
and space ((?<=, )
) and precedes the .
(.
is metacharacter so we escape or else it matches any character)
library(dplyr)
library(stringr)
snippet <- snippet %>%
mutate(Title = str_extract(Name, "(?<=, )\\S (?=\\.)"))
-output
snippet
# A tibble: 3 × 3
Name Column Title
<chr> <chr> <chr>
1 Braund, Mr. Owen Harris blah Mr
2 Cumings, Mrs. John Bradley blah Mrs
3 Heikkinen, Miss. Laina blah Miss
data
snippet <- structure(list(Name = c("Braund, Mr. Owen Harris",
"Cumings, Mrs. John Bradley",
"Heikkinen, Miss. Laina"), Column = c("blah", "blah", "blah")),
class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))