Home > Back-end >  How to move the first instance of a string in a column so it's before that string's group
How to move the first instance of a string in a column so it's before that string's group

Time:11-08

I have a dataframe with multiple columns like this:

df <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin: Good',
    'Wisconsin: Neutral',
    'Wisconsin: Bad',
    NA,
    'Minnesota: Good',
    'Minnesota: Neutral',
    'Minnesota: Bad',
    NA,
    'New England',
    'New Hampshire: Good',
    'New Hampshire: Neutral',
    'New Hampshire: Bad',
    NA,
    'Vermont: Good',
    'Vermont: Neutral',
    'Vermont: Bad'
  ),
  TotalPct = c(
    NA,
    '75%',
    '71%',
    '65%',
    NA,
    '82%',
    '76%',
    '53%',
    NA,
    NA,
    '68%',
    '98%',
    '45%',
    NA,
    '79%',
    '93%',
    '48%'
  )
)

I'm interested in doing two things to this dataframe:

  1. Grab the first instance of the Good level for each state and place it in a new row about the Good level, and...
  2. Remove the state names from each level so it just shows Good, Neutral, and Bad below each state name.

The result would look like this:

df_desired <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Minnesota',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'New England',
    'New Hampshire',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Vermont',
    'Good',
    'Neutral',
    'Bad'
  ),
  TotalPct = c(
    NA,
    NA,
    '75%',
    '71%',
    '65%',
    NA,
    NA,
    '82%',
    '76%',
    '53%',
    NA,
    NA,
    NA,
    '68%',
    '98%',
    '45%',
    NA,
    NA,
    '79%',
    '93%',
    '48%'
  )
)

What's the best way to go about achieving this using R, ideally using dplyr if possible?

CodePudding user response:

I'm a bit unclear as to what you want, but this is usually very useful format:

df %>%
  separate(Level, c("State", "Level"), sep = ": ") %>%
  na.omit() %>% 
  print(n = Inf)
# A tibble: 12 x 3
   State         Level   TotalPct
   <chr>         <chr>   <chr>   
 1 Wisconsin     Good    75%     
 2 Wisconsin     Neutral 71%     
 3 Wisconsin     Bad     65%     
 4 Minnesota     Good    82%     
 5 Minnesota     Neutral 76%     
 6 Minnesota     Bad     53%     
 7 New Hampshire Good    68%     
 8 New Hampshire Neutral 98%     
 9 New Hampshire Bad     45%     
10 Vermont       Good    79%     
11 Vermont       Neutral 93%     
12 Vermont       Bad     48%     

CodePudding user response:

This feels clunky but I think it gets there:

library(tidyverse)
df %>%
  separate(Level, c("region", "rating"), sep = ": ") %>%
  mutate(copies = if_else(rating != "Good" | is.na(rating), 1, 2)) %>%
  uncount(copies, .id = "id") %>%
  mutate(TotalPct = if_else(rating == "Good" & id == 1, NA_character_, TotalPct),
         level = if_else(is.na(TotalPct), region, rating)) %>%
  select(level, TotalPct)

Result

           level TotalPct
1        Midwest     <NA>
2      Wisconsin     <NA>
3           Good      75%
4        Neutral      71%
5            Bad      65%
6           <NA>     <NA>
7      Minnesota     <NA>
8           Good      82%
9        Neutral      76%
10           Bad      53%
11          <NA>     <NA>
12   New England     <NA>
13 New Hampshire     <NA>
14          Good      68%
15       Neutral      98%
16           Bad      45%
17          <NA>     <NA>
18       Vermont     <NA>
19          Good      79%
20       Neutral      93%
21           Bad      48%
  • Related