Rearrange data into a wider format in a specific manner-CodePudding

These sample dataframes have data on the two segments of a piecewise regression line. In "df", segment_start is the beginning of the segment, and segment_end is the end of the segment. There is a data point for years 2000-2010. "value" is the slope of the segment. The first segment goes from 2000-2006 and the second from 2007-2010. What I'm trying to do is combine the rows for "obs" 1 into a single row, with all data points, 2000-2010, included, as shown in "df2". Is there a code to automate this process? Thank you for any ideas in advance.

#data I have:
df <- tibble("obs" = 1:1, 
             "segment"=c(1,2), 
             "segment_start"=c(2000,2006), 
             "segment_end"=c(2006, 2010),
             "value"=c(0.5, 1.5))

df

#data I want:
df2 <- tibble("obs"=1,
             "2000"=0.5,
             "2001"=0.5,
             "2002"=0.5,
             "2003"=0.5,
             "2004"=0.5,
             "2005"=0.5,
             "2006"=0.5,
             "2007"=1.5,
             "2008"=1.5,
             "2009"=1.5,
             "2010"=1.5)

df2

CodePudding user response：

Assuming you have non-overlapping regions such as

df <- tibble("obs" = 1:1, 
             "segment"=c(1,2), 
             "segment_start"=c(2000, 2007), 
             "segment_end"=c(2006, 2010),
             "value"=c(0.5, 1.5))

Then you can use rowise with mutate() to create a sequence for each row. Then ou use tidyr::unnest_longer to expand those into rows. Finally you use tidyr::pivot_wider to turn those rows into columns

df %>% 
  rowwise() %>% 
  mutate(year = list(segment_start:segment_end)) %>% 
  unnest_longer(year) %>% 
  pivot_wider(obs, names_from=year, values_from=value)

It's just important to be consistent about the start/ends and whether or not each interval end value is open or closed.

CodePudding user response：

With the original dataframe, the expected output can be done with complete fill pivot_wider.

library(tidyverse)

df %>% 
  complete(segment_start = full_seq(c(min(segment_start), max(segment_end)), 1)) %>% 
  fill(obs, segment, value) %>% 
  pivot_wider(-c(segment, segment_end), names_from = segment_start)

# A tibble: 1 x 12
    obs `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010`
  <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1    0.5    0.5    0.5    0.5    0.5    0.5    1.5    1.5    1.5    1.5    1.5