mystring <- "\n\n-Acanthosis nigricans\n-Hyperpigmentation\n-Hyperkeratosis\n-Skin fold regions\n-Neck\n-Groin\n-Axillae\n-Obesity \n-Drug-induced AN\n-Malignant AN"
I would like to extract the terms between \n-
and \n
and store it as a vector:
> mystring_extracted
[1] "Acanthosis nigricans" "Hyperpigmentation" "Hyperkeratosis" "Skin fold regions"
[5] "Neck" "Groin" "Axillae" "Obesity"
[9] "Drug-induced AN" "Malignant AN"
I tried the following, but it didn't do what I wanted:
> gsub("\n-", "", mystring)
[1] "\nAcanthosis nigricansHyperpigmentationHyperkeratosisSkin fold regionsNeckGroinAxillaeObesity Drug-induced ANMalignant AN"
CodePudding user response:
Use strsplit
. It will return a list which in this case contains one component which is almost the desired character vector so use [[1]] to get that and then remove the junk first element. No packages are used.
strsplit(mystring, "\n-")[[1]][-1]
giving:
[1] "Acanthosis nigricans" "Hyperpigmentation" "Hyperkeratosis"
[4] "Skin fold regions" "Neck" "Groin"
[7] "Axillae" "Obesity " "Drug-induced AN"
[10] "Malignant AN"
A variation of that is the following which first removes the junk at the beginning and then performs the split and performs an unlist to get the character vector.
mystring |>
trimws(whitespace = "[\n-]") |>
strsplit("\n-") |>
unlist()