Home > Software engineering >  R: How can I add rows based on values of a data frame?
R: How can I add rows based on values of a data frame?

Time:05-29

Currently, I have two data frames that I would like to merge. Data frame A contains daily energy and water consumption data of hotel rooms and data frame B contains information about the people that lived in the rooms. To match the data structure of data frame A, I need to solve the following problem regarding data frame B.

Data frame B currently looks like this:

   `Person ID``                     `Apartment`                    `contract_start`         `contract_end`
   <chr>                             <chr>                          <date>                   <date>                
 1 hnd48                             T217                           2021-09-16               2021-09-18            
 2 jFDJu                             T217                           2021-09-19               2021-09-21            
 3 kqKcX                             A705                           2021-09-16               2021-09-19            

To match the data structure of data frame A, each day a person lived inside a hotel room needs to be a new row. Therefore, I would like to add a new column 'dates' which counts for each day a person lived inside the room by starting with 'contract_start' day and ending with the 'contract_end' day. Hence, the data frame would ideally look like this:

   `Person ID``                     `Apartment`                    `dates`         
   <chr>                             <chr>                          <date>                                 
 1 hnd48                             T217                           2021-09-16
 2 hnd48                             T217                           2021-09-17
 3 hnd48                             T217                           2021-09-18                           
 4 jFDJu                             T217                           2021-09-19
 5 jFDJu                             T217                           2021-09-20
 6 jFDJu                             T217                           2021-09-21                            
 7 kqKcX                             A705                           2021-09-16
 8 kqKcX                             A705                           2021-09-17
 9 kqKcX                             A705                           2021-09-18
 10kqKcX                             A705                           2021-09-19                             

How could I do this with code?

Best regards, Vincent

CodePudding user response:

library(tidyverse)
df = tribble(
  ~`Person ID`, ~Apartment, ~contract_start, ~contract_end,
  "hnd48", "T217", "2021-09-16", "2021-09-18",
  "jFDJu", "T217", "2021-09-19", "2021-09-21", 
  "kqKcX", "A705", "2021-09-16", "2021-09-19"
) %>%
  mutate(across(c(contract_start, contract_end), as.Date)) 

df %>% 
  rowwise() %>% 
  mutate(
    dates = paste0(
      as.character(
        seq(contract_start, contract_end, by = "days")
        ), collapse = ",")
    ) %>%
  select(-c(contract_start, contract_end)) %>% 
  separate_rows(dates, sep = ",") %>% 
  mutate(dates = as.Date(dates))
# A tibble: 10 x 3
   `Person ID` Apartment dates     
   <chr>       <chr>     <date>    
 1 hnd48       T217      2021-09-16
 2 hnd48       T217      2021-09-17
 3 hnd48       T217      2021-09-18
 4 jFDJu       T217      2021-09-19
 5 jFDJu       T217      2021-09-20
 6 jFDJu       T217      2021-09-21
 7 kqKcX       A705      2021-09-16
 8 kqKcX       A705      2021-09-17
 9 kqKcX       A705      2021-09-18
10 kqKcX       A705      2021-09-19
  • Related