Home > Mobile >  How to loop over the columns in a dataframe, apply spread, and create a new dataframe in R?
How to loop over the columns in a dataframe, apply spread, and create a new dataframe in R?

Time:08-22

I have a dataframe which looks like this example, just much larger:

Name  date         var1  var2  var3 
Peter 2020-03-30   0.4   0.5   0.2
Ben   2020-10-14   0.6   0.4   0.1
Mary  2020-12-06   0.7   0.2   0.9

I want to create a new dataframe for each variable (i.e., var1, var2, var3), which should look like this, e.g., for var1:

date         Peter    Ben    Mary
2020-03-30   0.4      NA     NA
2020-10-14   NA       0.6    NA
2020-12-06   NA       NA     0.7

I can do it with spread for one variable at a time:

df_new <-tidyr::spread(df[,-c(2:3)], name, var1)

But I could not figure out how to loop it over all columns as I am new to R.

Thank you!

CodePudding user response:

We can do it this way: The beginning is similar to user438383 solution. But then we name each tibble in the list and save them to the global environment within the the pipe. For this we need massign from collapse package: thanks to @akrun How to save each named tibble in a list, as a separate tibble or dataframe in one run


library(tidyverse)
library(collapse)
df %>% 
  pivot_longer(cols = contains("var")) %>%
  group_split(name) %>%
  setNames(unique(df$Name)) %>%
  map(. %>%  pivot_wider(names_from = Name)) %>%
  map(. %>%  select(-name)) %>% 
  massign(names(.), ., .GlobalEnv)

Ben
Mary
Peter
 A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.5  NA    NA  
2 2020-10-14  NA     0.4  NA  
3 2020-12-06  NA    NA     0.2
> Mary
# A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.2  NA    NA  
2 2020-10-14  NA     0.1  NA  
3 2020-12-06  NA    NA     0.9
> Peter
# A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.4  NA    NA  
2 2020-10-14  NA     0.6  NA  
3 2020-12-06  NA    NA     0.7

CodePudding user response:

First we want to create a list of data frames and then pivot each one:

library(tidyverse)
res_list = dat %>% 
   pivot_longer(cols = contains("var")) %>% 
   split(., .$name) %>% 
   map(. %>% pivot_wider(names_from="Name"))
$var1
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1    0.4  NA    NA  
2 2020-10-14 var1   NA     0.6  NA  
3 2020-12-06 var1   NA    NA     0.7

$var2
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var2    0.5  NA    NA  
2 2020-10-14 var2   NA     0.4  NA  
3 2020-12-06 var2   NA    NA     0.2

$var3
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var3    0.2  NA    NA  
2 2020-10-14 var3   NA     0.1  NA  
3 2020-12-06 var3   NA    NA     0.9

Then you can access them like

res_list["var1"]

# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1    0.4  NA    NA  
2 2020-10-14 var1   NA     0.6  NA  
3 2020-12-06 var1   NA    NA     0.7
  • Related