how to stack the same data frame in itself and change a value in a column per stack in r-CodePudding

I wasn't sure how to word this question.

I have a data frame called p08, that shows how each state voted in the 2008 election. The indicator variable is named 'DemStatus' where 1==voted democrat and 0==voted republican.

I want to label each state as republican and democrat for all four years between elections. For instance, Alabama voted republican in 2008, so I want to label them as 0 (republican) for 2008, 2009,2010, and 2011.

I accomplished this by copying the data frame and naming it something else for each year. However, this is a very tedious process since I have election data from the 90s until 2020.

QUESTION: Is there a faster/simpler way to accomplish exactly what I have below?

p08=structure(list(STATE = c("Alabama", "Alaska", "Arizona", "Arkansas", 
"California", "Colorado", "Connecticut", "Delaware", "Dist. of Col.", 
"Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", 
"Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", 
"Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", 
"Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", 
"New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", 
"Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", 
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", 
"Washington", "West Virginia", "Wisconsin", "Wyoming"), YEAR = c(2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008), DemStatus = c(0, 0, 0, 0, 
1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 
0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 
1, 1, 0, 1, 0)), row.names = 10:60, class = "data.frame")

party09=p08
party09$YEAR=2009

party10=p08
party10$YEAR=2010

party11=p08
party11$YEAR=2011

party08_11 = bind_rows(p08,party09,party10,party11)

CodePudding user response：

Another way would be creating a sequence per each row. This could be applied to any dataset regardless of which or how many election years it contains, provided that there is always a gap of 3 years.

library(data.table)

p08 <- setDT(p08)[, .(STATE, YEAR = seq(YEAR, YEAR   3L), DemStatus), by = 1:nrow(p08)][, nrow := NULL]

Output (showing first and last 5 rows):

         STATE YEAR DemStatus
  1:   Alabama 2008         0
  2:   Alabama 2009         0
  3:   Alabama 2010         0
  4:   Alabama 2011         0
  5:    Alaska 2008         0
 ---                         
200: Wisconsin 2011         1
201:   Wyoming 2008         0
202:   Wyoming 2009         0
203:   Wyoming 2010         0
204:   Wyoming 2011         0

CodePudding user response：

Do you mean something like this?

library(dplyr)
p08 %>%
  left_join(tibble(YEAR=2008, YEAR1=2008:2011), by = "YEAR") %>%
  mutate(YEAR = YEAR1, YEAR1 = NULL)
# # A tibble: 204 x 3
#    STATE    YEAR DemStatus
#    <chr>   <int>     <dbl>
#  1 Alabama  2008         0
#  2 Alabama  2009         0
#  3 Alabama  2010         0
#  4 Alabama  2011         0
#  5 Alaska   2008         0
#  6 Alaska   2009         0
#  7 Alaska   2010         0
#  8 Alaska   2011         0
#  9 Arizona  2008         0
# 10 Arizona  2009         0
# # ... with 194 more rows

(I'm inferring dplyr based on your use of bind_rows.)

Alternatively, using tidyr::complete:

p08 %>%
  tidyr::complete(STATE, YEAR = 2008:2011) %>%
  group_by(STATE) %>%
  mutate(DemStatus = na.omit(DemStatus)[1]) %>%
  ungroup()
# # A tibble: 204 x 3
#    STATE    YEAR DemStatus
#    <chr>   <dbl>     <dbl>
#  1 Alabama  2008         0
#  2 Alabama  2009         0
#  3 Alabama  2010         0
#  4 Alabama  2011         0
#  5 Alaska   2008         0
#  6 Alaska   2009         0
#  7 Alaska   2010         0
#  8 Alaska   2011         0
#  9 Arizona  2008         0
# 10 Arizona  2009         0
# # ... with 194 more rows

CodePudding user response：

in Base R, you can do the same using outer join:

merge(p08[-2], list(YEAR=2008:2011), by = NULL) # -2 means remove year

             STATE DemStatus YEAR
1          Alabama         0 2008
2           Alaska         0 2008
3          Arizona         0 2008
4         Arkansas         0 2008
5       California         1 2008
6         Colorado         1 2008
:            :             :  :

if you do not know the index for the year, you can simply do:

merge(subset(p08, select = -YEAR), list(YEAR = 2008:2011), by =NULL)

Note that if you do not care about the namings, this can be as simple as

merge(p08, 2008:2011)

or even

merge(p08, 2008:2011)[-2]
             STATE DemStatus    y
1          Alabama         0 2008
2           Alaska         0 2008
3          Arizona         0 2008
4         Arkansas         0 2008
5       California         1 2008
6         Colorado         1 2008
7      Connecticut         1 2008
8         Delaware         1 2008