How to scrape 'In more languages' table on Wikidata?-CodePudding

I'm trying to scrape the WHOLE 'In more languages' table on Wikidata pages, e.g.

However, all this means is that we need to look for a different URL that's sourcing the full table. Using Chrome's developer tools we learn that the table's coming from https://www.wikidata.org/wiki/Special:EntityData/Q3044.json and that's the page we actually want to scrape. If we download that using jsonLite we don't get the table exactly, but we can reassemble it using some dplyr tools. Here's a snippet of code that does that:


wiki_data <- jsonlite::read_json("https://www.wikidata.org/wiki/Special:EntityData/Q3044.json")
table_data <- wiki_data$entities$Q3044

library(dplyr)
label_col <- bind_rows(table_data$labels) %>% rename(label=value)
desc_col <- bind_rows(table_data$descriptions) %>% rename(description=value)
alias_col <- bind_rows(table_data$aliases) %>% 
  rename(alias=value) %>%
  group_by(language) %>%
  summarise(alias=paste(alias, collapse = ", "))

full_table <- label_col %>%
  left_join(desc_col) %>%
  left_join(alias_col)

with the first few rows of the output shown below:

> full_table
# A tibble: 157 x 4
   language label                         description                                        alias
   <chr>    <chr>                         <chr>                                              <chr>
 1 fr       Charlemagne                   empereur d'Occident et roi des Francs              Char~
 2 en       Charlemagne                   King of the Franks, King of Italy, and Holy Roman~ Karo~
 3 it       Carlo Magno                   re dei Franchi e dei Longobardi e primo imperator~ NA   
 4 ilo      Karlomagno                    Ari dagiti Pranko ken Lombardo ken Emperador ti N~ NA