How to move the first instance of a string in a column so it's before that string's group?-CodePudding

I have a dataframe with a single column like the below:

df <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin: Good',
    'Wisconsin: Neutral',
    'Wisconsin: Bad',
    NA,
    'Minnesota: Good',
    'Minnesota: Neutral',
    'Minnesota: Bad',
    NA,
    'New England',
    'New Hampshire: Good',
    'New Hampshire: Neutral',
    'New Hampshire: Bad',
    NA,
    'Vermont: Good',
    'Vermont: Neutral',
    'Vermont: Bad'
  )
)

I'm interested in doing two things to this column:

Grab the first instance of the Good level for each state and place it in a new row about the Good level, and...
Remove the state names from each level so it just shows Good, Neutral, and Bad below each state name.

The result would look like this:

df_desired <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Minnesota',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'New England',
    'New Hampshire',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Vermont',
    'Good',
    'Neutral',
    'Bad'
  )
)

What's the best way to go about achieving this using R, ideally using dplyr if possible?

CodePudding user response：

Here is one way in tidyverse - Extract the prefix part before the : in 'Level' with str_extract, then split the 'Level' column at the : with separate_rows, remove the duplicated elements and remove the prefix column 'new'

library(dplyr)
library(stringr)
library(tidyr)
out <- df %>%
     mutate(new = str_extract(Level, "\\w (?=:)")) %>% 
     separate_rows(Level, sep = ":\\s*") %>% 
     filter(!duplicated(cur_data()) | is.na(Level)) %>%
     select(-new)

-checking with OP's desired

> all.equal(out, df_desired, check.attributes = FALSE)
[1] TRUE