I need to plot a time series graph but the data that I'm using is proving to be quite challenging.
Ideally, I'd like a graph that looks something like this:
But mine looks like this:
I have tried a series of different things but none of them have worked.
some code I have tried includes
ggplot( aes(x=date, y=northEast))
geom_area(fill="#69b3a2", alpha=0.5)
geom_line(color="#69b3a2")
ylab("test")
theme_ipsum()
ggplot(covidData2)
geom_line(
mapping = aes(x = weekBeginning, y=northEast, group=northEast)
)
Any help would be greatly appreciated!
CodePudding user response:
You need to tidy your data up before plotting it. If you look at your data frame, all of the "numeric" columns have been interpreted as character vectors because the column names are nested and therefore appear in the first couple of rows. You need to consolidate these and convert them to column names. Then, you need to convert the numeric columns to numbers. Finally, you need to parse the dates, as ggplot will simply read the periods as character vectors:
library(readxl)
library(lubridate)
library(ggplot2)
library(hrbrthemes)
wb <- read_xlsx(path.expand("~/covid.xlsx"), sheet = "Table 9")
df <- as.data.frame(wb)
df[1, 1] <- ""
for(i in 2:length(df)) {
if(is.na(df[1, i])) df[1, i] <- df[1, i - 1]
}
nms <- trimws(paste(df[1,], df[2,]))
df <- df[-c(1:2),]
names(df) <- nms
df <- df[sapply(df, function(x) !all(is.na(x)))]
df[-1] <- lapply(df[-1], as.numeric)
df <- head(df, -3)
df$Period <- dmy(substr(df$Period, 1, 10))
Now we can plot:
ggplot(df, aes(x = Period, y = `North East Rate`))
geom_area(fill = "#69b3a2", alpha=0.5)
geom_line(color = "#69b3a2")
ylab("Rate per 100,000")
xlab("")
theme_ipsum()
Created on 2022-03-08 by the reprex package (v2.0.1)