I have a dataframe with multiple columns like this:
df <- data.frame(
Level = c(
'Midwest',
'Wisconsin: Good',
'Wisconsin: Neutral',
'Wisconsin: Bad',
NA,
'Minnesota: Good',
'Minnesota: Neutral',
'Minnesota: Bad',
NA,
'New England',
'New Hampshire: Good',
'New Hampshire: Neutral',
'New Hampshire: Bad',
NA,
'Vermont: Good',
'Vermont: Neutral',
'Vermont: Bad'
),
TotalPct = c(
NA,
'75%',
'71%',
'65%',
NA,
'82%',
'76%',
'53%',
NA,
NA,
'68%',
'98%',
'45%',
NA,
'79%',
'93%',
'48%'
)
)
I'm interested in doing two things to this dataframe:
- Grab the first instance of the
Good
level for each state and place it in a new row about theGood
level, and... - Remove the state names from each level so it just shows
Good
,Neutral
, andBad
below each state name.
The result would look like this:
df_desired <- data.frame(
Level = c(
'Midwest',
'Wisconsin',
'Good',
'Neutral',
'Bad',
NA,
'Minnesota',
'Good',
'Neutral',
'Bad',
NA,
'New England',
'New Hampshire',
'Good',
'Neutral',
'Bad',
NA,
'Vermont',
'Good',
'Neutral',
'Bad'
),
TotalPct = c(
NA,
NA,
'75%',
'71%',
'65%',
NA,
NA,
'82%',
'76%',
'53%',
NA,
NA,
NA,
'68%',
'98%',
'45%',
NA,
NA,
'79%',
'93%',
'48%'
)
)
What's the best way to go about achieving this using R, ideally using dplyr
if possible?
CodePudding user response:
I'm a bit unclear as to what you want, but this is usually very useful format:
df %>%
separate(Level, c("State", "Level"), sep = ": ") %>%
na.omit() %>%
print(n = Inf)
# A tibble: 12 x 3
State Level TotalPct
<chr> <chr> <chr>
1 Wisconsin Good 75%
2 Wisconsin Neutral 71%
3 Wisconsin Bad 65%
4 Minnesota Good 82%
5 Minnesota Neutral 76%
6 Minnesota Bad 53%
7 New Hampshire Good 68%
8 New Hampshire Neutral 98%
9 New Hampshire Bad 45%
10 Vermont Good 79%
11 Vermont Neutral 93%
12 Vermont Bad 48%
CodePudding user response:
This feels clunky but I think it gets there:
library(tidyverse)
df %>%
separate(Level, c("region", "rating"), sep = ": ") %>%
mutate(copies = if_else(rating != "Good" | is.na(rating), 1, 2)) %>%
uncount(copies, .id = "id") %>%
mutate(TotalPct = if_else(rating == "Good" & id == 1, NA_character_, TotalPct),
level = if_else(is.na(TotalPct), region, rating)) %>%
select(level, TotalPct)
Result
level TotalPct
1 Midwest <NA>
2 Wisconsin <NA>
3 Good 75%
4 Neutral 71%
5 Bad 65%
6 <NA> <NA>
7 Minnesota <NA>
8 Good 82%
9 Neutral 76%
10 Bad 53%
11 <NA> <NA>
12 New England <NA>
13 New Hampshire <NA>
14 Good 68%
15 Neutral 98%
16 Bad 45%
17 <NA> <NA>
18 Vermont <NA>
19 Good 79%
20 Neutral 93%
21 Bad 48%