Home > Back-end >  Matrix to Dataframe conversion fails in R
Matrix to Dataframe conversion fails in R

Time:01-19

I have created a function that calls an api and parses for the id, label, description, and score of each annotation. But I can't seem to get the dataframe to display properly.

Here's code:

get_wikidata_links <- function(input_text, minimum_score) {
  
  #
  # Function which takes a character vector of length 1 as input (i.e. all text
  # needs to be combined into a single character) as well as a minimum certainty
  # score, and returns a tibble with key information and links to Wikidata
  #
  # Input
  #  - input_text: Text input (character)
  #  - minimum_score: Minimum score that every returned entity needs to have
  #                   (numeric)
  #
  # Output
  #  - top_wikidata_links: Table with the first four columns being 'id', 'label',
  #               'description', 'score' (tibble)
  #
  
  base_url <- "https://opentapioca.org/api/annotate"
  r <- GET(base_url, query = list(query = input_text))
  
  
  data = content(r)$annotations
  
  framed = list()
  vec = list()
  dummy = 0
  for (i in 1:length(data)) {
    
    data1 = data[[i]]$tags
    
    for (j in 1:length(data1)) {
      
      data2 = data1[[j]]

      if (data2$score>minimum_score) {
        
        vec[1] <- data2$id
        vec[2] <- data2$label
        vec[3] <- data2$desc
        vec[4] <- data2$score
        dummy <-  dummy   1
        framed[[dummy]] <- vec
      }
    }
  }
  
  data_matrix <- do.call("rbind", framed)
  top_wikidata_links <- as.data.frame(data_matrix, stringsAsFactors = FALSE)
  colnames(top_wikidata_links) <- c("ID", "Label", "Description", "Score")
  
  return(top_wikidata_links)
  
}

Now I test this function with a couple phrases:

# Test 1
text_example_1 <- c("Karl Popper worked at the LSE.")
get_wikidata_links(input_text_1, -0.5)
# 
# Hint: The output should be a tibble similar to the one outlined below
#
# | id | label | description | score |
# | "Q81244" | "Karl Popper" | "Austrian-British philosopher of science" | 2.4568285 |
# | "Q174570" | "London School of Economics and Political Science" | "university in Westminster, UK" | "1.4685043" |
# | "Q171240" | "London Stock Exchange" | "stock exchange in the City of London" | "-0.4124461" |

# Test 2
text_example_2 <- c("Claude Shannon studied at the University of Michigan and at MIT.")
get_wikidata_links(text_example_2, 0)

Now for some reason the matrix data_matrix works fine:

Output

But the data frame conversion fails as such:

Output

CodePudding user response:

I guess it's bit easier to manage through some hoisting and unnesting. Inspired by https://tidyr.tidyverse.org/articles/rectangle.html :

library(httr)
library(tidyr)
library(dplyr)

get_wikidata_links <- function(input_text, minimum_score) {
  base_url <- "https://opentapioca.org/api/annotate"
  r <- GET(base_url, query = list(query = input_text))

  tibble(link = content(r)$annotations) %>% 
    hoist(link, tags = "tags") %>% 
    unnest_longer(tags) %>% 
    hoist(tags, ID = "id", Label = "label", Description = "desc", Score = "score") %>% 
    select(ID:Score) %>% 
    filter(Score >= minimum_score)
}

text_example_1 <- c("Karl Popper worked at the LSE.")
get_wikidata_links(text_example_1, -0.5)
#> # A tibble: 3 × 4
#>   ID      Label                                            Description     Score
#>   <chr>   <chr>                                            <chr>           <dbl>
#> 1 Q81244  Karl Popper                                      Austrian-Brit…  2.46 
#> 2 Q174570 London School of Economics and Political Science university in…  1.47 
#> 3 Q171240 London Stock Exchange                            stock exchang… -0.412

text_example_2 <- c("Claude Shannon studied at the University of Michigan and at MIT.")
get_wikidata_links(text_example_2, 0)
#> # A tibble: 3 × 4
#>   ID      Label                                 Description                Score
#>   <chr>   <chr>                                 <chr>                      <dbl>
#> 1 Q92760  Claude Shannon                        American mathematician an… 1.96 
#> 2 Q230492 University of Michigan                public research universit… 1.29 
#> 3 Q49108  Massachusetts Institute of Technology research university in Ca… 0.902

Created on 2023-01-19 with reprex v2.0.2

Service itself seems bit unstable or overloaded.

  • Related