Home > front end >  Dplyr Summarise on Multiple Conditions
Dplyr Summarise on Multiple Conditions

Time:01-22

So I am working on some data visualisations of satellite populations, I would like to display for each month and altitude, how many satellites were launched into each orbit.

I have a list (snippet below, which is fake data). I am trying to create a bubble plot, where for each date (grouped by month) on the X axis, the altidude (grouped by 100km) and then the size of the bubble would be the count (shown in the second table).

COSPAR_ID LAUNCH_DATE ALTITUDE
IRIDIUM 180 2019-01-01 1150
IRIDIUM 176 2019-02-01 1250
RISESAT 2019-04-06 1150
RAPIS-1 2019-03-01 1375
MICRODRAGON 2019-05-01 400
NEXUS (FO-99) 2019-04-01 459
ALE-1 2019-05-01 1000
IRIDIUM 167 2019-04-01 900
IRIDIUM GSAT-31 2019-0-01 666
IRIDIUM 188 2019-06-01 1000
IRIDIUM 111 2019-02-01 1250
IRIDIUM 123 2019-01-01 1150
LAUNCH_DATE ALTITUDE COUNT
Jan-19 0-500 10
Jan-19 500-1000 100
Jan-19 1000-1500 150
Feb-19 0-500 20
Feb-19 500-1000 90
Feb-19 1000-1500 150

So far, I am getting quite lost. I am using dplyr to be able to summarise first by the month, and then starting to count altitudes.

df <- df %>% 
  group_by(month = lubridate::floor_date(LAUNCH_DATE, 'month')) %>%
  summarize(sum = sum(count), 
            sumA = n(ALTITUDE < 100))

My next steps would be to group the altidudes first, then summarise by the date? I am hitting a brick wall. So not sure where to go next? Can anyone point me in the right direction?

Happy to add the original dataset of satellites, just quite a large file.

CodePudding user response:

You can use cut() to make your bins, defining your breaks and labels beforehand:

library(dplyr)
library(lubridate)

alti_breaks <- seq(0, by = 500, length.out = ceiling(df$ALTITUDE / 500)   1)
alti_labs <- paste(head(alti_breaks, -1), tail(alti_breaks, -1), sep = "-")

df <- df %>%
  count(
    LAUNCH_DATE = floor_date(LAUNCH_DATE, 'month'),
    ALTITUDE = cut(ALTITUDE, alti_breaks, alti_labs),
    name = "COUNT"
  )

df
# A tibble: 9 × 3
  LAUNCH_DATE ALTITUDE  COUNT
  <date>      <fct>     <int>
1 2019-01-01  1000-1500     2
2 2019-02-01  1000-1500     2
3 2019-03-01  1000-1500     1
4 2019-04-01  0-500         1
5 2019-04-01  500-1000      1
6 2019-04-01  1000-1500     1
7 2019-05-01  0-500         1
8 2019-05-01  500-1000      2
9 2019-06-01  500-1000      1

And the bubble plot:

library(ggplot2)

ggplot(df, aes(LAUNCH_DATE, ALTITUDE))  
  geom_point(aes(size = COUNT), color = blues9[[6]], show.legend = FALSE)  
  theme_minimal()  
  theme(panel.grid.minor.x = element_blank())

  • Related