Home > Blockchain >  Line plot of data by grouping rows
Line plot of data by grouping rows

Time:10-25

I wish to plot a line plot of the df below by grouping the rows, so i would have 1 line for GDP, 1 line for agriculture and 1 line for services (ignoring countries for now), does anyone know if this is possible using ggplot?

My final plot would have an x axis of years and a y axis of gdp (value)

economics_df

Series Name              Country        1997        1998        1999        2000
GDP (current US$)        Spain   5.90077E 11 6.19215E 11 6.34908E 11 5.98363E 11
GDP (current US$)        France  1.45288E 12 1.50311E 12 1.49315E 12 1.36564E 12
GDP (current US$)        Monaco  2840175545  2934498443  2906093757  2647885849
GDP (current US$)        Italy   1.24188E 12 1.27005E 12 1.25245E 12 1.14668E 12
GDP (current US$)        Croatia 24091170703 25792876644 23677307509 21839780971
Agriculture  (% of GDP)  Spain   4.302210034 4.150411966 3.817378211 3.745305634
Agriculture (% of GDP)   France  2.344255815 2.362459834 2.236261411 2.098357551
Agriculture (% of GDP)   Monaco  2.544255815 2.342459834 2.234261411 2.108357551
Agriculture (% of GDP)   Italy   2.861911574 2.768857277 2.722232363  2.56361412
Agriculture (% of GDP)   Croatia 5.228986538 5.306173593 5.393085168 4.961600952
Services (% of GDP)      Syria   45.65197856 44.15290647 45.68986146 41.94697681
Services(% of GDP)       Lebanon 60.61030928 58.32727829 59.05884148 61.52190623
Services (% of GDP       Israel  62.02333939 63.02788655 63.92563162 64.72521236
Services (% of GDP)      Egypt   48.15193682 48.28789144 47.55581925 46.52599236
Services (% of GDP)      Libya   44.15193682 44.28789144 45.55581925 45.55581445

CodePudding user response:

You need to get the data into the right shape. ggplot makes plotting very easy once the data is in long form, which is easy to do with dplyr and tidyr:

library(dplyr)
library(ggplot2)
library(tidyr)

econ_for_plot  <- economics_df  |>
    pivot_longer(-c(`Series Name`, Country), names_to = "year")  |>
    group_by(`Series Name`, year)  |>
    summarise(value = sum(value))

econ_for_plot
# # A tibble: 12 x 3
# # Groups:   Series Name [3]   
#    `Series Name` year    value
#    <chr>         <chr>   <dbl>
#  1 Agriculture   1997  1.73e 1
#  2 Agriculture   1998  1.69e 1
#  3 Agriculture   1999  1.64e 1
#  4 Agriculture   2000  1.55e 1
#  5 GDP           1997  3.31e12
#  6 GDP           1998  3.42e12
#  7 GDP           1999  3.41e12
#  8 GDP           2000  3.14e12
#  9 Services      1997  2.61e 2
# 10 Services      1998  2.58e 2
# 11 Services      1999  2.62e 2
# 12 Services      2000  2.60e 2

I have used sum() in the summarise() call, but you could replace it with mean() or any other function to aggregate the data. Once it is in this form you can plot it:


ggplot(econ_for_plot, 
    aes(
        x = year,
        y = value,
        color = `Series Name`,
        group = `Series Name`
    )
)  
    geom_point()  
    geom_line()  
    scale_y_log10()  
    labs(
        title = "Sum of spending",
        y = "Sum of category (log scale)"
    )  
    theme_bw()

enter image description here

Input data

economics_df <- structure(list(`Series Name` = c(
    "GDP", "GDP", "GDP", "GDP",
    "GDP", "Agriculture", "Agriculture", "Agriculture", "Agriculture",
    "Agriculture", "Services", "Services", "Services", "Services",
    "Services"
), Country = c(
    "Spain", "France", "Monaco", "Italy",
    "Croatia", "Spain", "France", "Monaco", "Italy", "Croatia", "Syria",
    "Lebanon", "Israel", "Egypt", "Libya"
), `1997` = c(
    5.90077e 11,
    1.45288e 12, 2840175545, 1.24188e 12, 24091170703, 4.302210034,
    2.344255815, 2.544255815, 2.861911574, 5.228986538, 45.65197856,
    60.61030928, 62.02333939, 48.15193682, 44.15193682
), `1998` = c(
    6.19215e 11,
    1.50311e 12, 2934498443, 1.27005e 12, 25792876644, 4.150411966,
    2.362459834, 2.342459834, 2.768857277, 5.306173593, 44.15290647,
    58.32727829, 63.02788655, 48.28789144, 44.28789144
), `1999` = c(
    6.34908e 11,
    1.49315e 12, 2906093757, 1.25245e 12, 23677307509, 3.817378211,
    2.236261411, 2.234261411, 2.722232363, 5.393085168, 45.68986146,
    59.05884148, 63.92563162, 47.55581925, 45.55581925
), `2000` = c(
    5.98363e 11,
    1.36564e 12, 2647885849, 1.14668e 12, 21839780971, 3.745305634,
    2.098357551, 2.108357551, 2.56361412, 4.961600952, 41.94697681,
    61.52190623, 64.72521236, 46.52599236, 45.55581445
)), class = "data.frame", row.names = c(
    NA,
    -15L
))

Edit: I made the Y-axis log-scale because the range of values was large. But now I have read the comments and looked at the data more closely, I realise that this plots absolute dollars and relative percent on the same scale. So this post tells you how to construct such a plot - although it does not really make sense to do so in this case.

  • Related