Home > Software engineering >  Using R to fetch a Pubmed abstract by using its title
Using R to fetch a Pubmed abstract by using its title

Time:03-20

I have been trying for a while to fetch Pubmed abstracts by using its title. For istance, if I put the following title on the pubmPd mask @ https://pubmed.ncbi.nlm.nih.gov/ :

A Pituitary-Derived MEG3 Isoform Functions as a Growth Suppressor in Tumor Cells

I obtain a page showing the following abstract:

Abstract Human pituitary adenomas are the most common intracranial neoplasm. Typically monoclonal in origin, a somatic mutation is a prerequisite event in tumor development. To identify underlying pathogenetic mechanisms in tumor formation, we compared the difference in gene expression between normal human pituitary tissue and clinically nonfunctioning pituitary adenomas by cDNA-representational difference analysis. We cloned a cDNA, the expression of which was absent in these tumors, that represents a novel transcript from the previously described MEG3, a maternal imprinting gene with unknown function. It was expressed in normal human gonadotrophs, from which clinically nonfunctioning pituitary adenomas are derived. Additional investigation by Northern blot and RT-PCR demonstrated that this gene was also not expressed in functioning pituitary tumors as well as many human cancer cell lines. Moreover, ectopic expression of this gene inhibits growth in human cancer cells including HeLa, MCF-7, and H4. Genomic analysis revealed that MEG3 is located on chromosome 14q32.3, a site that has been predicted to contain a tumor suppressor gene involved in the pathogenesis of meningiomas. Taken together, our data suggest that MEG3 may represent a novel growth suppressor, which may play an important role in the development of human pituitary adenomas.

Is there any command in R packages that could do the same? I have been playing with some tools like 'easyPubmed', 'Rentrez', etc, but I was a little intimidated by their complexity. Thanks in advance.

CodePudding user response:

We can use rvest to get the abstract by submitting form.

library(rvest)
library(dplyr)

# URL
url = 'https://pubmed.ncbi.nlm.nih.gov/'
ncbi <- html_session(url)

# Grab the Form
search <- ncbi %>% html_node("form") %>% html_form()
#fill the form 
search <- search %>%
  html_form_set("term" = "A Pituitary-Derived MEG3 Isoform Functions as a Growth Suppressor in Tumor Cells")

# submit the form and save as a new session
session <- submit_form(ncbi, search) 

# get abstract
abstract <- session %>% html_nodes('.abstract-content') %>% html_text()

CodePudding user response:

I think the easyPubMed package is relatively easy to use, as implied by the name. Here's a complete example.

You can create a query character value, in this case just used the same title as in the post.

You can perform the PubMed query using get_pubmed_ids and retrieve the records using fetch_pubmed_data.

Then, using table_articles_byAuth you can put your results into a data.frame. By setting included_authors to "first", you will only get info on the first authors of the records. Also, using max_chars you can set the limit to number of characters included from the abstract.

library(easyPubMed)

my_query <- paste(
  'A Pituitary-Derived MEG3 Isoform Functions as a Growth Suppressor in Tumor Cells',
  '[ti]'
)

my_pubmed_ids <- get_pubmed_ids(my_query)
my_data <- fetch_pubmed_data(my_pubmed_ids, encoding = "ASCII")

df <- table_articles_byAuth(my_data,
                            included_authors = "first",
                            max_chars = 2000,
                            encoding = "ASCII")

The resulting columns of your data.frame will be the following:

names(df)

 [1] "pmid"      "doi"       "title"     "abstract"  "year"      "month"     "day"       "jabbrv"    "journal"   "keywords" 
[11] "lastname"  "firstname" "address"   "email" 

If you want to see all your abstracts, they will be in the abstract column of your data.frame:

df$abstract

[1] "Human pituitary adenomas are the most common intracranial neoplasm...
  • Related