I have created a function that calls an api and parses for the id, label, description, and score of each annotation. But I can't seem to get the dataframe to display properly.
Here's code:
get_wikidata_links <- function(input_text, minimum_score) {
#
# Function which takes a character vector of length 1 as input (i.e. all text
# needs to be combined into a single character) as well as a minimum certainty
# score, and returns a tibble with key information and links to Wikidata
#
# Input
# - input_text: Text input (character)
# - minimum_score: Minimum score that every returned entity needs to have
# (numeric)
#
# Output
# - top_wikidata_links: Table with the first four columns being 'id', 'label',
# 'description', 'score' (tibble)
#
base_url <- "https://opentapioca.org/api/annotate"
r <- GET(base_url, query = list(query = input_text))
data = content(r)$annotations
framed = list()
vec = list()
dummy = 0
for (i in 1:length(data)) {
data1 = data[[i]]$tags
for (j in 1:length(data1)) {
data2 = data1[[j]]
if (data2$score>minimum_score) {
vec[1] <- data2$id
vec[2] <- data2$label
vec[3] <- data2$desc
vec[4] <- data2$score
dummy <- dummy 1
framed[[dummy]] <- vec
}
}
}
data_matrix <- do.call("rbind", framed)
top_wikidata_links <- as.data.frame(data_matrix, stringsAsFactors = FALSE)
colnames(top_wikidata_links) <- c("ID", "Label", "Description", "Score")
return(top_wikidata_links)
}
Now I test this function with a couple phrases:
# Test 1
text_example_1 <- c("Karl Popper worked at the LSE.")
get_wikidata_links(input_text_1, -0.5)
#
# Hint: The output should be a tibble similar to the one outlined below
#
# | id | label | description | score |
# | "Q81244" | "Karl Popper" | "Austrian-British philosopher of science" | 2.4568285 |
# | "Q174570" | "London School of Economics and Political Science" | "university in Westminster, UK" | "1.4685043" |
# | "Q171240" | "London Stock Exchange" | "stock exchange in the City of London" | "-0.4124461" |
# Test 2
text_example_2 <- c("Claude Shannon studied at the University of Michigan and at MIT.")
get_wikidata_links(text_example_2, 0)
Now for some reason the matrix data_matrix
works fine:
But the data frame conversion fails as such:
CodePudding user response:
I guess it's bit easier to manage through some hoisting and unnesting. Inspired by https://tidyr.tidyverse.org/articles/rectangle.html :
library(httr)
library(tidyr)
library(dplyr)
get_wikidata_links <- function(input_text, minimum_score) {
base_url <- "https://opentapioca.org/api/annotate"
r <- GET(base_url, query = list(query = input_text))
tibble(link = content(r)$annotations) %>%
hoist(link, tags = "tags") %>%
unnest_longer(tags) %>%
hoist(tags, ID = "id", Label = "label", Description = "desc", Score = "score") %>%
select(ID:Score) %>%
filter(Score >= minimum_score)
}
text_example_1 <- c("Karl Popper worked at the LSE.")
get_wikidata_links(text_example_1, -0.5)
#> # A tibble: 3 × 4
#> ID Label Description Score
#> <chr> <chr> <chr> <dbl>
#> 1 Q81244 Karl Popper Austrian-Brit… 2.46
#> 2 Q174570 London School of Economics and Political Science university in… 1.47
#> 3 Q171240 London Stock Exchange stock exchang… -0.412
text_example_2 <- c("Claude Shannon studied at the University of Michigan and at MIT.")
get_wikidata_links(text_example_2, 0)
#> # A tibble: 3 × 4
#> ID Label Description Score
#> <chr> <chr> <chr> <dbl>
#> 1 Q92760 Claude Shannon American mathematician an… 1.96
#> 2 Q230492 University of Michigan public research universit… 1.29
#> 3 Q49108 Massachusetts Institute of Technology research university in Ca… 0.902
Created on 2023-01-19 with reprex v2.0.2
Service itself seems bit unstable or overloaded.