Home > OS >  How can I web scrap GitHub project contributor information in R?
How can I web scrap GitHub project contributor information in R?

Time:01-30

I would like to write a function that extracts some contributor data from a GitHub project's contributor page. For example: https://github.com/easystats/report/graphs/contributors

How can I extract, using R, for example the username, number of commits, number of additions, and number of removals?

Here is my attempt at web scrapping using rvest (https://github.com/tidyverse/rvest):

library(rvest)

contribs <- read_html("https://github.com/easystats/report/graphs/contributors")

section <- contribs %>% html_elements("section")
section
#> {xml_nodeset (0)}

contribs$node
#> <pointer: 0x0000027d9b9e9f10>
contribs$doc
#> <pointer: 0x0000027d9e03d140>

Created on 2023-01-29 with reprex v2.0.2

But I think I am not getting the expected result.

However, I would much prefer a solution where I could use an existing R package for this, or the GitHub API (https://github.com/r-lib/gh). But is it possible at all?

CodePudding user response:

Found their API in the network section in the developer tools

library(tidyverse)
library(httr2)

"https://github.com/easystats/report/graphs/contributors-data" %>%
  request() %>%
  req_headers("x-requested-with" = "XMLHttpRequest",
              accept = "appliacation/json") %>%
  req_perform() %>%
  resp_body_json(simplifyVector = TRUE) %>%
  unnest(everything()) %>%
  group_by(username = str_remove(path, "/")) %>%
  summarise(across(a:c, sum)) 

# A tibble: 21 x 4
   username                a      d     c
   <chr>               <int>  <int> <int>
 1 DominiqueMakowski  203778 148154   325
 2 IndrajeetPatil      15082  10513   159
 3 LukasWallrich           1      1     1
 4 bwiernik             1371    156    11
 5 cgeger                  1      1     1
 6 drfeinberg            127     23     1
 7 dtoher                 26     26     1
 8 etiennebacher         127    162     7
 9 fkohrt                  1      1     1
10 grimmjulian             2      2     1
11 humanfactors           22     23     4
12 jdtrat                  1      1     1
13 m-macaskill            33     31     2
14 mattansb             1009    603    14
15 mutlusun              265      4     4
16 pkoaz                   3      2     1
17 rempsyc              3427   2938    14
18 strengejacke         5129  38164   223
19 vincentarelbundock      5      0     1
20 webbedfeet             85     85     2
21 wjschne                 2      2     1
  • Related