I understand there have been similar posts, but I'm a little confused on how I can use pivot_longer
to transform my own data from wide to long format. The code below creates a mock dataset that's structured similar to my real data.
library(tidyverse)
## Dummy data.
# ID Variables.
part <- rep(rep(paste0("P", c(1:2)), each = 20, times = 2))
type <- rep(c("pre", "post"), each = 10, times = 4)
sp <- rep(c("slow", "mod"), each = 40)
# Values
var1_site1_L <- rep(c(1, NA), each = 5, times = 8)
var1_site1_R <- rep(c(1, NA), each = 5, times = 8)
var1_site1_ALL <- rep(1, times = 80)
var1_site1_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
var2_site2_L <- rep(c(1, NA), each = 5, times = 8)
var2_site2_R <- rep(c(1, NA), each = 5, times = 8)
var2_site2_ALL <- rep(1, times = 80)
var2_site2_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
dat <- data.frame(part, type, sp, var1_site1_L, var1_site1_R, var1_site1_ALL,
var1_site1_ALL_M, var2_site2_L, var2_site2_R, var2_site2_ALL,
var2_site2_ALL_M)
I want to be able to keep the variables part
, type
and sp
as ID variables, but add the unique column name separators as additional ID variables with the specific value in the final column. For example, I'd like the result to be something similar to (note this is only a very basic example and, of course, there will be many more observations, including those NA values in the value column):
par type sp var site side misc value
p1 pre slow var1 site1 L NA 1
p1 pre slow var1 site1 R NA 1
p1 pre slow var1 site1 ALL NA 1
p1 pre slow var1 site1 ALL M 1
I know this is a pretty unique data structure. I'm particularly stuck on how to deal with the fourth column name separator (M
) in some instances (cases where there is only a single value per ID variables).
I got up to the below code which I know needs a bit of work if I'm to achieve the result I'm after.
long <- dat %>%
pivot_longer(cols = c(1:3),
names_to = c("var", "site", "side", "misc"),
names_sep = "_")
Any help will be greatly appreciated!
CodePudding user response:
I don't think you can get there with pivot_longer, but try this.
library(stringr)
results <- data.frame()
for (x in 4:length(dat)){
names <- names(dat[,c(1:3,x)])
res <- dat %>%
mutate(id = 1:nrow(dat)) %>%
select(id, names) %>%
mutate(var = str_extract(names[4],"var\\d"),
site = str_extract(names[4],"site\\d"),
side = str_extract(names[4],"L|R|ALL"),
misc = str_extract(names[4],"[M]"),
misc = ifelse(is.na(misc), "NA", misc)) %>%
rename("value" = 5) %>%
select(id, part, type, sp, var, site, side, misc, value)
results <- rbind(results, res)
}
head(results %>% arrange(id) %>% select(-id))
part type sp var site side misc value
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
CodePudding user response:
dat %>%
pivot_longer(starts_with('var')) %>%
separate(name, c('var', 'site', 'side', 'misc'), fill = 'right')
# A tibble: 640 x 8
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
# ... with 630 more rows