Home > Enterprise >  Ggplot with yearmonth in x-axis
Ggplot with yearmonth in x-axis

Time:10-17

I have a dataset with information on where individuals work over time, where time is defined as year/month (and shown as numeric values YYYYMM in my dataset). I run a ggplot to visualise how long individuals stay in a given workplace as well as how they move around. I used position_dodge to make it visible when the same individual works in more than one place during the same month.

In the simple example below:

  • individual A works in place 1 from Jan/2012 (i.e., 201201) until Dec/2012
  • individual B works in place 2 from Jan/2012 until Jun/2012 and then switches to place 2 from Jul/2012 until Nov/2012
  • individual C works in place 1 from Jan/2012 until Apr/2012 and in place 2 from Feb/2012 until Jun/2012
  • individual D works in place 1 only during Jan/2012

My query is related to how to use time intervals. In my dataset, the time period variable refers to the entire month. For instance, individual A actually works in workplace 1 from 01/01/2012 until 31/12/2012 and individual D works in workplace 1 from 01/01/2012 until 31/01/2012.

# individual A
a_id <- c(rep('A',12))
a_period <- c(seq(201201, 201212))
a_workplace <-c(rep(1,12))

# individual B
b_id <- c(rep('B',11))
b_period <- c(seq(201201,201206), seq(201207,201211))
b_workplace <-c(rep(1,6), rep(2,5))

# individual C
c_id <- c(rep('C',9))
c_period <- c(seq(201201,201204), seq(201202,201206))
c_workplace <-c(rep(1,4), rep(2,5))

# individual D
d_id <- c(rep('D',1))
d_period <- c(seq(201201,201201))
d_workplace <-c(rep(1,1))

# final data frame
id <- c(a_id, b_id, c_id, d_id)
period <- c(a_period, b_period, c_period, d_period)
workplace <- as.factor(c(a_workplace, b_workplace, c_workplace, d_workplace))
mydata <- data.frame(id, period, workplace)

ggplot(mydata, aes(x = id, y = period, color = workplace))  
  geom_line(position = position_dodge(width = 0.1), size = 2)  
  scale_x_discrete(limits = rev)  
  scale_y_continuous(breaks = seq(201201, 201212, by = 1))  
  coord_flip()  
  theme(axis.text.x = element_text(angle=45, hjust=1),
        legend.position   = c(.8, .2), 
        legend.direction  = "vertical",
        legend.background = element_rect(linetype = "solid", colour = "black"), 
        panel.background  = element_rect(fill = "grey97"))  
  labs(y = "time", title = "Work affiliation")

The ggplot above considers year/month as a single point in time. For instance, it shows no working history for individual D. How do I consider each consecutive sequence at the individual-workplace level to begin on the first day of the first month & end on the last day of the last month of the consecutive sequence. I also would like also to convert the year/month variable from numeric to date format to make manipulation easier.

PS: I highlight each consecutive sequence in the paragraph above because the same individual may work in a given place for a few months, leave for a time period and then return to working again in this same workplace later on. In these cases, the two time intervals the individual work in this given workplace should be considered separately.

CodePudding user response:

Here is a possible solution: Logic: Data tweaking:

  1. split your integer period in year and month
  2. add a '01' as day column and make_date of these columns (year, month, day)
  3. group
  4. summarise
  5. create new column with unite to combine id workplace

plotting:

  1. Use geom_segment with start and end point. The drawback aka know there is not jittering in geom_segment therefore we created the the id_workplace column before
  2. Using date_breaks we choose month and label as month and year.

Note D has start and end same month therefore no line.

library(tidyverse)
library(lubridate) # make_date function
library(scales) # date_breaks function
mydata %>% 
  separate(period, into = c("year", "month"), sep = -2) %>% 
  mutate(day = '01', 
         period = make_date(year, month, day)) %>% 
  group_by(id, workplace) %>% 
  summarise(start = first(period), end=last(period), .groups = "drop") %>% 
  unite(id_workplace, c(id, workplace), remove = FALSE) %>% 
  ggplot()  
  geom_segment(aes(x = start, y = id_workplace, xend = end, yend = id_workplace,
                              col= workplace, ), size = 2, position = position_dodge(width = 0.1)) 
  scale_x_date( breaks=date_breaks("month"), date_labels = "%b-%Y")   
  xlab('Time')   
  theme(axis.text.x = element_text(vjust = 0.5))

enter image description here

CodePudding user response:

for the second question regarding conversion of numeric into date type i've got an answer:

library(lubridate) # handling and conversion of datetype
lubridate::ymd() # turns your numeric into a date
as.Date() #turns your characterstring into date type which is by the way the 
#proper way you should handover timerelated data to ggplot

that should do it for your code:

mydata$period=lubridate::ymd(mydata[,2])
  • Related