Home > Mobile >  Counting Number of People in a Hotel (R)
Counting Number of People in a Hotel (R)

Time:09-16

I am working with the R programming language. Suppose there is a hotel that has a list of customers with their check-in and check-out times (Note: The actual value of the dates is "POSIXct" and is written as "year-month-date".):

check_in_date <- c('2010-01-01', '2010-01-02' ,'2010-01-01', '2010-01-08', '2010-01-08', '2010-01-15', '2010-01-15', '2010-01-16', '2010-01-19', '2010-01-22')
check_out_date <- c('2010-01-07', '2010-01-04' ,'2010-01-09', '2010-01-21', '2010-01-11', '2010-01-22', 'still in hotel as of today', '2010-01-20', '2010-01-25', '2010-01-29')
Person = c("John", "Smith", "Alex", "Peter", "Will", "Matt", "Tim", "Kevin", "Tom", "Adam")

hotel <- data.frame(check_in_date, check_out_date, Person )

The data looks like something like this:

   check_in_date             check_out_date Person
1     2010-01-01                 2010-01-07   John
2     2010-01-02                 2010-01-04  Smith
3     2010-01-01                 2010-01-09   Alex
4     2010-01-08                 2010-01-21  Peter
5     2010-01-08                 2010-01-11   Will
6     2010-01-15                 2010-01-22   Matt
7     2010-01-15 still in hotel as of today    Tim
8     2010-01-16                 2010-01-20  Kevin
9     2010-01-19                 2010-01-25    Tom
10    2010-01-22                 2010-01-29   Adam

Question: I am trying to find out on any given day, how many people were still in the hotel. This would look something like this (just an example, does not correspond to the above data):

  day_of_the_year Number_of_people_currently_in_hotel
1      2010-01-01                                   1
2      2010-01-02                                   1
3      2010-01-03                                   2
4      2010-01-04                                   0
5      2010-01-05                                   5
6      2010-01-06                                   5
7      2010-01-07                                   2
8      2010-01-08                                   2
9      2010-01-09                                   8

I tried to solve this problem in 3 steps:

First Step: I generated a column containing every date from the start to the end (e.g. in this example, let's suppose that there are 31 days : from the start to the end of Jan-2010)

day_of_the_year = seq(as.Date("2010/1/1"), as.Date("2010/1/31"),by="day")

Second Step: I then determined how many people checked in to the hotel at each day:

library(dplyr)

#create some indicator variable 
hotel$event = 1

check_ins = hotel %>% group_by(check_in_date) %>%   summarise(n = n())

 check_in_date     n
  <chr>         <int>
1 2010-01-01        2
2 2010-01-02        1
3 2010-01-08        2
4 2010-01-15        2
5 2010-01-16        1
6 2010-01-19        1
7 2010-01-22        1

Third Step: I then repeated a similar step to determine how many people checked out of the hotel each day:

check_outs = hotel %>% group_by(check_out_date) %>%   summarise(n = n())

   check_out_date                 n
   <chr>                      <int>
 1 2010-01-04                     1
 2 2010-01-07                     1
 3 2010-01-09                     1
 4 2010-01-11                     1
 5 2010-01-20                     1
 6 2010-01-21                     1
 7 2010-01-22                     1
 8 2010-01-25                     1
 9 2010-01-29                     1
10 still in hotel as of today     1

Problem: Now, I am not sure how to combine the above 3 Steps in such a way so that we can find out how many people were staying at the hotel each day of the month. Can someone please show me how to do this?

Thanks!

Note: I found a "similar" question counting the number of people in the system in R , I am currently trying to see if I can adapt the methods used in this question for my problem.

CodePudding user response:

I think this might help, but for a total solution we need a reference date for those that did not check ou yet

library(tidyverse)

hotel %>% 
  mutate(
    across(.cols = ends_with("_date"),.fns = ymd),
    check_out_date = if_else(is.na(check_out_date), today(),check_out_date) 
    ) %>% 
  mutate(
    date = map2(
      .x = check_in_date,
      .y = check_out_date,
      .f = function(x,y)seq.Date(from = x,to = y,by = "1 day"))
  ) %>% 
  unnest() %>% 
  count(date)

# A tibble: 29 x 2
   date           n
   <date>     <int>
 1 2010-01-01     2
 2 2010-01-02     3
 3 2010-01-03     3
 4 2010-01-04     3
 5 2010-01-05     2
 6 2010-01-06     2
 7 2010-01-07     2
 8 2010-01-08     3
 9 2010-01-09     3
10 2010-01-10     2
# ... with 19 more rows

CodePudding user response:

I used hotel$check_in_date = as.Date(hotel$check_in_date) and hotel$check_out_date = as.Date(hotel$check_out_date) to convert the strings to dates. This function will then count the number of guests for a given date. Since you have a note in for guests that are currently checked in, I created a temporary data frame in the function to avoid overwriting the original data.

count_guests = function(date) {
  temp = hotel
  temp$check_out_date = ifelse(is.na(temp$check_out_date), as.Date(date), temp$check_out_date)
  counts = ifelse((temp$check_in_date <= date) &(temp$check_out_date >= date), 1, 0)
  return(sum(counts))
}

count_guests(as.Date("2010-01-02"))
[1] 3

count_guests(as.Date("2010-01-10"))
[1] 2

count_guests(as.Date("2010-01-21"))
[1] 4

EDIT: On second thought it looks like you want a new data frame. This can be done easily with apply().

guests = data.frame(day_of_the_year = seq(as.Date("2010/1/1"), as.Date("2010/1/31"),by="day"))
guests$num_checked_in = lapply(guests$day_of_the_year, FUN = count_guests)

 day_of_the_year num_checked_in
1       2010-01-01              2
2       2010-01-02              3
3       2010-01-03              3
4       2010-01-04              3
5       2010-01-05              2
...

CodePudding user response:

You can try using "lubridate" package which i believe is part of tidyverse. So if load tidyverse you don't have to load lubridate again.

Use ymd to convert character to date since year-month-day is the format of your date.

dt <- tibble(checkin = lubridate::ymd(check_in_date),
checkout = lubridate::ymd(check_out_date),
person = Person)

For anyone that has not checked out yet, assign them checkout date of today using today() function. Or if you know the date when this data was collected that may be another sensible date to assign here.

Create interval objects with start as checkin date and end as checkout date. Similarly create interval object for the date(s) you want to check. Here I am using 2010-01-07. Find overlap using int_overlap()

dt<- dt %>% mutate(
checkout = replace_na(checkout, today()),
stay_interval = lubridate::interval(start = checkin, end = checkout),
date_of_interest = lubridate::interval(ymd("2010-01-07"), ymd("2010-01-07")),
stay = lubridate::int_overlaps(date_of_interest, stay_interval)
)
dt %>% count(stay)

# A tibble: 2 x 2
  stay      n
  <lgl> <int>
1 FALSE     8
2 TRUE      2
  • Related