I have a paragraph such as below:
y = "I have been working with ABC CORPORATION nearly about 4 years now. And today remark my last day working at this company. I am proudly announce that I will joining XYZ SDN BHD starting this Monday."
I need to extract only company name from this whole paragraph so that the output only show like this:
"ABC CORPORATION", "XYZ SDN BHD"
Is there a way to do it in R as I'm not really familiar yet with text analysis in R.
Is using dplyr split better or grep?
CodePudding user response:
using stringr
's str_extract_all
;
y = "I have been working with ABC CORPORATION nearly about 4 years now. And today remark my last day working at this company. I am proudly announce that I will joining XYZ SDN BHD starting this Monday."
uppercase_words <- unlist(stringr::str_extract_all(y,pattern = '([:upper:]|[:space:]){2,}'))
uppercase_words <- uppercase_words[nchar(gsub('[[:blank:]]','',uppercase_words))!=1]
uppercase_words
output;
' ABC CORPORATION '' XYZ SDN BHD '
CodePudding user response:
We could use str_extract_all with a regex pattern:
library(stringr)
str_extract_all(y,"[A-Z][\\w-]*(\\s [A-Z][\\w-]*) ")
output:
[1] "ABC CORPORATION" "XYZ SDN BHD"