I have a data.frame for which I would like to separate the IV column into separate rows for every piece of text separated by a comma "," excluding those pieces of text that feature commas between parentheses e.g. ",text (string, string, string),".
Example of the current data:
structure(list(Article.Title = "Random title",
Sample = "Sample information",
IV = "Union voice, HRM practices (participation, teams, incentives, development, recruitment), implict contracts, Crisis impact, dominant individual or family owner, no dominant individual or family owner, market growth, no market growth,",
Moderator = NA_character_, Mediator = NA_character_, DV = "Performance"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"))
Expected result:
structure(list(Article.Title = c("Random title", "Random title",
"Random title", "Random title", "Random title", "Random title",
"Random title", "Random title"), Sample = c("Sample information",
"Sample information", "Sample information", "Sample information",
"Sample information", "Sample information", "Sample information",
"Sample information"), IV = c("Union voice", "HRM practices (participation, teams, incentives, development, recruitment)",
"implict contracts", "Crisis impact", "dominant individual or family owner",
"no dominant individual or family owner", "market growth", "no market growth"
), Moderator = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"
), Mediator = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"
), DV = c("Performance", "Performance", "Performance", "Performance",
"Performance", "Performance", "Performance", "Performance")), class = "data.frame", row.names = c(NA,
-8L))
CodePudding user response:
We could do this in base R
with strsplit
by splitting the 'IV' column at the ,
while SKIP
ping the characters inside the parentheses, and then rep
licate the rows if the data by the lengths
of the list
created with strsplit
lst1 <- strsplit(df1$IV, "\\([^)] (*SKIP)(*FAIL)|,\\s*", perl = TRUE)
df2 <- transform(df1[setdiff(names(df1), "IV")][rep(seq_len(nrow(df1)),
lengths(lst1)),], IV = unlist(lst1))[names(df1)]
-output
> df2
Article.Title Sample IV Moderator Mediator DV
1 Random title Sample information Union voice <NA> <NA> Performance
2 Random title Sample information HRM practices (participation, teams, incentives, development, recruitment) <NA> <NA> Performance
3 Random title Sample information implict contracts <NA> <NA> Performance
4 Random title Sample information Crisis impact <NA> <NA> Performance
5 Random title Sample information dominant individual or family owner <NA> <NA> Performance
6 Random title Sample information no dominant individual or family owner <NA> <NA> Performance
7 Random title Sample information market growth <NA> <NA> Performance
8 Random title Sample information no market growth <NA> <NA> Performance
Or use the same regex in separate_rows
(as in the comments)
library(tidyr)
separate_rows(df1, IV, sep = "\\([^)] (*SKIP)(*FAIL)|,\\s*")
-output
# A tibble: 9 × 6
Article.Title Sample IV Moderator Mediator DV
<chr> <chr> <chr> <chr> <chr> <chr>
1 Random title Sample information "Union voice" <NA> <NA> Performance
2 Random title Sample information "HRM practices (participation, teams, incentives, development, recruitment)" <NA> <NA> Performance
3 Random title Sample information "implict contracts" <NA> <NA> Performance
4 Random title Sample information "Crisis impact" <NA> <NA> Performance
5 Random title Sample information "dominant individual or family owner" <NA> <NA> Performance
6 Random title Sample information "no dominant individual or family owner" <NA> <NA> Performance
7 Random title Sample information "market growth" <NA> <NA> Performance
8 Random title Sample information "no market growth" <NA> <NA> Performance
9 Random title Sample information "" <NA> <NA> Performance