I have a dataframe with a single column like the below:
df <- data.frame(
Level = c(
'Midwest',
'Wisconsin: Good',
'Wisconsin: Neutral',
'Wisconsin: Bad',
NA,
'Minnesota: Good',
'Minnesota: Neutral',
'Minnesota: Bad',
NA,
'New England',
'New Hampshire: Good',
'New Hampshire: Neutral',
'New Hampshire: Bad',
NA,
'Vermont: Good',
'Vermont: Neutral',
'Vermont: Bad'
)
)
I'm interested in doing two things to this column:
- Grab the first instance of the
Good
level for each state and place it in a new row about theGood
level, and... - Remove the state names from each level so it just shows
Good
,Neutral
, andBad
below each state name.
The result would look like this:
df_desired <- data.frame(
Level = c(
'Midwest',
'Wisconsin',
'Good',
'Neutral',
'Bad',
NA,
'Minnesota',
'Good',
'Neutral',
'Bad',
NA,
'New England',
'New Hampshire',
'Good',
'Neutral',
'Bad',
NA,
'Vermont',
'Good',
'Neutral',
'Bad'
)
)
What's the best way to go about achieving this using R, ideally using dplyr
if possible?
CodePudding user response:
Here is one way in tidyverse
- Extract the prefix part before the :
in 'Level' with str_extract
, then split the 'Level' column at the :
with separate_rows
, remove the duplicated
elements and remove the prefix column 'new'
library(dplyr)
library(stringr)
library(tidyr)
out <- df %>%
mutate(new = str_extract(Level, "\\w (?=:)")) %>%
separate_rows(Level, sep = ":\\s*") %>%
filter(!duplicated(cur_data()) | is.na(Level)) %>%
select(-new)
-checking with OP's desired
> all.equal(out, df_desired, check.attributes = FALSE)
[1] TRUE