I have used Wikitable API to download the table of Nobel Laureates using the following code:
json_2 <- content(response_2, "text")
json_new <- fromJSON(json_2)
json_new <- fromJSON(json_2)
wiki_nobel <- as.data.frame(json_new)
When I convert it into a dataframe, I get the following output. I am unsure of how to convert this into rows and columns. [1,1] should be the column name, followed by the row values
I've tried using
wiki_nobel <- json_new %>% as_tibble()
wiki_nobel <- bind_rows(as.data.frame(json_new)
But they provide the same output.
Any help is appreciated. Thanks
CodePudding user response:
There are several Wikitable API services.
JSON from https://wikitable2json.vercel.app/ can be rectangled with just jsonlite::read_json()
:
api_req <- "https://wikitable2json.vercel.app/api/List_of_Nobel_laureates?table=0"
nobel_1 <- jsonlite::read_json(api_req, simplifyVector = T)
tibble::as_tibble(nobel_1)
#> # A tibble: 122 × 7
#> Year Physics Chemi…¹ Physi…² Liter…³ Peace Econo…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1901 Wilhelm Röntgen Jacobu… Emil A… Sully … Henr… —
#> 2 1902 Hendrik Lorentz;Pieter Zeeman Herman… Ronald… Theodo… Élie… —
#> 3 1903 Henri Becquerel;Pierre Curie;Mar… Svante… Niels … Bjørns… Rand… —
#> 4 1904 Lord Rayleigh Willia… Ivan P… Frédér… Inst… —
#> 5 1905 Philipp Lenard Adolf … Robert… Henryk… Bert… —
#> 6 1906 J. J. Thomson Henri … Camill… Giosuè… Theo… —
#> 7 1907 Albert Abraham Michelson Eduard… Charle… Rudyar… Erne… —
#> 8 1908 Gabriel Lippmann Ernest… Élie M… Rudolf… Klas… —
#> 9 1909 Karl Ferdinand Braun;Guglielmo M… Wilhel… Emil T… Selma … Augu… —
#> 10 1910 Johannes Diderik van der Waals Otto W… Albrec… Paul H… Inte… —
#> # … with 112 more rows, and abbreviated variable names ¹Chemistry,
#> # ²`Physiologyor Medicine`, ³Literature, ⁴Economics
Response from https://www.wikitable2json.com/ needs just bit more work:
library(purrr)
nobel_2 <- jsonlite::read_json("https://www.wikitable2json.com/api/List_of_Nobel_laureates")
# response includes a single (nested) list
nobel_2 <- nobel_2[[1]]
# 1st list holds column names
col_names <- unlist(nobel_2[[1]])
# name all other lists, map_dfr turns named lists into single data frame
map_dfr(nobel_2[-1], ~ set_names(.x, col_names))
#> # A tibble: 123 × 7
#> Year Physics Chemi…¹ Physi…² Liter…³ Peace Econo…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1901 Wilhelm Röntgen Jacobu… Emil A… Sully … Henr… —
#> 2 1902 Hendrik Lorentz;Pieter Zeeman Herman… Ronald… Theodo… Élie… —
#> 3 1903 Henri Becquerel;Pierre Curie;Mar… Svante… Niels … Bjørns… Rand… —
#> 4 1904 Lord Rayleigh Willia… Ivan P… Frédér… Inst… —
#> 5 1905 Philipp Lenard Adolf … Robert… Henryk… Bert… —
#> 6 1906 J. J. Thomson Henri … Camill… Giosuè… Theo… —
#> 7 1907 Albert Abraham Michelson Eduard… Charle… Rudyar… Erne… —
#> 8 1908 Gabriel Lippmann Ernest… Élie M… Rudolf… Klas… —
#> 9 1909 Karl Ferdinand Braun;Guglielmo M… Wilhel… Emil T… Selma … Augu… —
#> 10 1910 Johannes Diderik van der Waals Otto W… Albrec… Paul H… Inte… —
#> # … with 113 more rows, and abbreviated variable names ¹Chemistry,
#> # ²`Physiologyor Medicine`, ³Literature,
#> # ⁴`Economics(The Sveriges Riksbank Prize)[13][lower-alpha 1]`
Created on 2023-01-17 with reprex v2.0.2
Table from wikitable2json is longer by one row, it includes footer with column names.
For some guidelines on how to approach rectangling problems with Tidyverse - https://tidyr.tidyverse.org/articles/rectangle.html