I have a data frame which is one column of parts of names. It's formatted so that some rows are first name, some are middle initial and some are last name. Last name is always the row below first name, and middle initial is always above first name. Like this
df <-
tibble(
name = c(
"Q",
"Name: John",
"Smith",
"X",
"J",
"Name: Homer",
"Simpson",
"X",
"X",
"Q",
"Name: John",
"Citizen"
)
)
Which gives us
> df
# A tibble: 12 × 1
name
<chr>
1 Q
2 Name: John
3 Smith
4 X
5 J
6 Name: Homer
7 Simpson
8 X
9 X
10 Q
11 Name: John
12 Citizen
What I want to do is collapse the rows so that if a row starts with "Name:", then it pastes the middle row, row above and row below to get something like:
> df_fixed
# A tibble: 6 × 1
name
<chr>
1 Name: John Q Smith
2 X
3 Name: Homer J Simpson
4 X
5 X
6 Name: John Q Citizen
CodePudding user response:
we could do it this way:
According to your provided rules using lag
and lead
will solve it. The rest is some tweaking:
library(dplyr)
library(stringr)
df %>%
mutate(x = lag(name),
y = lead(name)) %>%
mutate(across(-name, ~ifelse(str_detect(name, "Name"),., ""))) %>%
transmute(name = str_trim(paste(name, x, y))) %>%
filter(str_detect(name, "Name") | str_detect(name, "X"))
name
<chr>
1 Name: John Q Smith
2 X
3 Name: Homer J Simpson
4 X
5 X
6 Name: John Q Citizen