I have a dataframe which looks like this example, just much larger:
Name date var1 var2 var3
Peter 2020-03-30 0.4 0.5 0.2
Ben 2020-10-14 0.6 0.4 0.1
Mary 2020-12-06 0.7 0.2 0.9
I want to create a new dataframe for each variable (i.e., var1, var2, var3), which should look like this, e.g., for var1:
date Peter Ben Mary
2020-03-30 0.4 NA NA
2020-10-14 NA 0.6 NA
2020-12-06 NA NA 0.7
I can do it with spread
for one variable at a time:
df_new <-tidyr::spread(df[,-c(2:3)], name, var1)
But I could not figure out how to loop it over all columns as I am new to R.
Thank you!
CodePudding user response:
We can do it this way:
The beginning is similar to user438383 solution.
But then we name each tibble in the list and save them to the global environment within the the pipe. For this we need massign
from collapse
package: thanks to @akrun How to save each named tibble in a list, as a separate tibble or dataframe in one run
library(tidyverse)
library(collapse)
df %>%
pivot_longer(cols = contains("var")) %>%
group_split(name) %>%
setNames(unique(df$Name)) %>%
map(. %>% pivot_wider(names_from = Name)) %>%
map(. %>% select(-name)) %>%
massign(names(.), ., .GlobalEnv)
Ben
Mary
Peter
A tibble: 3 x 4
date Peter Ben Mary
<chr> <dbl> <dbl> <dbl>
1 2020-03-30 0.5 NA NA
2 2020-10-14 NA 0.4 NA
3 2020-12-06 NA NA 0.2
> Mary
# A tibble: 3 x 4
date Peter Ben Mary
<chr> <dbl> <dbl> <dbl>
1 2020-03-30 0.2 NA NA
2 2020-10-14 NA 0.1 NA
3 2020-12-06 NA NA 0.9
> Peter
# A tibble: 3 x 4
date Peter Ben Mary
<chr> <dbl> <dbl> <dbl>
1 2020-03-30 0.4 NA NA
2 2020-10-14 NA 0.6 NA
3 2020-12-06 NA NA 0.7
CodePudding user response:
First we want to create a list of data frames and then pivot each one:
library(tidyverse)
res_list = dat %>%
pivot_longer(cols = contains("var")) %>%
split(., .$name) %>%
map(. %>% pivot_wider(names_from="Name"))
$var1
# A tibble: 3 × 5
date name Peter Ben Mary
<date> <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1 0.4 NA NA
2 2020-10-14 var1 NA 0.6 NA
3 2020-12-06 var1 NA NA 0.7
$var2
# A tibble: 3 × 5
date name Peter Ben Mary
<date> <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var2 0.5 NA NA
2 2020-10-14 var2 NA 0.4 NA
3 2020-12-06 var2 NA NA 0.2
$var3
# A tibble: 3 × 5
date name Peter Ben Mary
<date> <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var3 0.2 NA NA
2 2020-10-14 var3 NA 0.1 NA
3 2020-12-06 var3 NA NA 0.9
Then you can access them like
res_list["var1"]
# A tibble: 3 × 5
date name Peter Ben Mary
<date> <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1 0.4 NA NA
2 2020-10-14 var1 NA 0.6 NA
3 2020-12-06 var1 NA NA 0.7