Home > Mobile >  Loop in R to reshape dataframes
Loop in R to reshape dataframes

Time:01-27

I have multiple dataframes that I need to reshape. They all have the same lenght. They look like this, for example the dataframe "ID111"

Year    MS...2  UL...3  JS...4  BF...5  FF...6  RF...7  CL...8  FL...9
1959    NA      NA      NA      NA      NA      NA      NA      NA
1960    NA      NA      NA      NA      NA      NA      NA      NA
1961    NA      NA      NA      NA      NA      NA      NA      NA
1962    NA      NA      NA      NA      NA      NA      NA      NA
1963    NA      NA      NA      NA      NA      NA      NA      NA
1964    NA      123     NA      NA      NA      NA      288     319

I used

colnames(ID111) <- c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")
ID111$Species <- "111"

ID111 %>%
  pivot_longer(
    cols = "MS":"FL",
    names_to = "Phase", 
    values_to = "DOY"
  )

to have it reshaped like this

Year    Species  Phase  DOY  
1959    111      MS     NA  
1959    111      UL     NA   
1959    111      JS     NA   
1959    111      BF     NA   
1959    111      FF     NA   
1959    111      RF     NA   
1959    111      CL     NA   
1959    111      FL     NA   
1960    111      MS     NA   

This works fine, but now I want to reshape multiple dataframes that way.

I tried a loop, but it doesn't work and I don't know why.

datasets <- list(ID111, ID112, ID123)

for(i in 1:length(datasets)){
  datasets[[i]]
  colnames(datasets) <- c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")
  datasets$Species <- "111"
  
  datasets %>%
    pivot_longer(
      cols = "MS":"FL",
      names_to = "Phase", 
      values_to = "DOY"
    )
}

Can anyone give me a hint on how to make this loop work?

CodePudding user response:

If it is a list, we may loop over a named list, and then use the reshape code. While binding the data into a single, create the column 'Species' using .id in _dfr (which returns the corresponding names of the list) and if needed, change the column position with relocate

library(purrr)
library(dplyr)
names(datasets) <- c(111, 112, 123)
map_dfr(datasets, ~ .x %>%
       setNames( c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")) %>%
      pivot_longer(cols = "MS":"FL",
      names_to = "Phase", 
      values_to = "DOY"
    ), .id = 'Species') %>%
  relocate(Species, .after = 'Year')

In the for loop, there are multiple issues - 1) the column is created on the list instead of the data.frame, 2) pivot_longer is applied directly on the list, 3) output is not stored for further analysis

out <- vector('list', length(datasets))
names(datasets) <- c(111, 112, 123)
for(i in seq_along(datasets)) 
{
 tmp <- datasets[[i]] # create a temporary data by extracting the element
 colnames(tmp) <- c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")
  tmp$Species <- names(datasets)[i]
  
  out[[i]] <- tmp %>%
    pivot_longer(
      cols = "MS":"FL",
      names_to = "Phase", 
      values_to = "DOY"
    )
}
bind_rows(out)
  • Related