I am going to be receiving an Excel Spreadsheet every month that's a summary of a bunch of different groups. I've created a mockup of what I'm going to be getting.
What I need to do is create a number of graphs (in R) using this data where (for example) the first graph is a line graph with the months for groups AB, CDE, and F. The second graph would be a bar chart for the same groups but for the quarters, etc. etc.
If this was the raw data, this wouldn't be as much of a challenge; unfortunately I won't (and can't) get the raw data, so have to make do with this.
I have no idea where to even start with this; should I transpose the data so I have Group | Month | Count? Or transpose it so that the Months are the rows and the groups the columns? I'm still learning R so really open to whatever suggestions you have :) Thanks Chris
CodePudding user response:
This is the kind of stuff that the tidyverse was invented to address.
If it were me, I would start by breaking it into two sets of data
- Quarterly Set
- Monthly Set
library(tidyverse)
quarterly<-data%>%select(Q1, Q2, Q3, Q4)%>% t
This selects only the columns listened saves them to a file quarterly and makes a column withQ and the alpha numerics int their own columns. From there you can add columns and group as you wish prior to plotting.
monthly<-data%>%select(-Q1, -Q2,-Q3,-Q4, -Total)%>%t
This saves everything BUT the quarterly and Total values into the data frame months
also transposing it so you have a column with months a column for each alpha label to allow you to do your summing then plot.
From there you should be able to use a good ggplot2 tutorial to get you going on the graphs. This is really more about thinking about data structure than programming.
It is always a good idea to break things down into conceptual steps
- what do I have?
- what do I need to have?
- what do I have to do to get there?
Then work through it. Subset the relevant data, restructure it and then plot it.
Once you figure out what you will need to do you can build function to apply each month when you get your aggregates to reduce the burden associated with doing everything manually.