How do I get the data extracted from API to my database-CodePudding

I am currently working on a project to build a database on professor's research paper database. This is my first time building a database(never had experience with MYSQL) and I am learning as I am doing it.

I was able to use an api to get the data, for example:

{"authorId": "1773022", "url": "https://www.semanticscholar.org/author/1773022", "papers": [{"paperId": "1253d2704580a74e776ae211602cfde71532c057", "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"}, {"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6", "title": "Task-Driven Learning of Spatial Combinations of Visual Features"}, {"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662", "title": "Statistical Learning of Visual Feature Hierarchies"}]}

How would I use python to turn this into a table so I can use it to build my database?

I am trying to make a table where columns are: Paper ID|Title|

CodePudding user response：

Firstly download MySQL and run the below SQL on the MySQL database to create your MYSQL table

CREATE TABLE papers
(
paperId varchar(1024),
title varchar(4000)
);

Then below py code, connects to your MySQL database and inserts the records. Just modify the host,user,password,database to your database. Sorry don't have MySQL to test it but it should work. I used pandas to convert the API list to a dataframe so its easier to work with when inserting into the database table, hope it helps

import pandas as pd
import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

mycursor = mydb.cursor()

data = {"authorId":"1773022","url":"https://www.semanticscholar.org/author/1773022","papers":[{"paperId": "1253d2704580a74e776ae211602cfde71532c057", "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"},{"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6", "title": "Task-Driven Learning of Spatial Combinations of Visual Features"},{"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662", "title": "Statistical Learning of Visual Feature Hierarchies"}]}
df = pd.DataFrame(data)
papersdf = pd.DataFrame(df['papers'].tolist())

for index, row in papersdf.iterrows():
    sql = print('INSERT INTO papers(paperId,title) VALUES("'   row['paperId']   '","'   row['title']   '")')
    mycursor.execute(sql)
    mydb.commit()
    print(mycursor.rowcount, "record inserted.")

CodePudding user response：

From https://www.w3schools.com/python/python_mysql_getstarted.asp

Install:

python -m pip install mysql-connector-python

Over view

Create a connection:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword"
)

Create a cursor to interact with the connection you made, then create a Database:

mycursor = mydb.cursor()

mycursor.execute("CREATE DATABASE mydatabase")

mydb.close()

After the database has been created, you can start connecting with it like so:

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

Note that you don't have to actually close your connection and reopen it to connect to that database, yet I don't see the docs mentioning anything about interacting with that specific database after you've created it, so I'm going to close it after I create the database...

Create the table with the proper datatypes and constraints:

mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")

Then insert into it and commit the entries:

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

Close the connection:

mydb.close()

Your Specific Case

This is the data you have provided:

data = {
    "authorId": "1773022",
    "url": "https://www.semanticscholar.org/author/1773022",
    "papers": [
        {
            "paperId": "1253d2704580a74e776ae211602cfde71532c057",
            "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"
        }, 
        {
            "paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6",
            "title": "Task-Driven Learning of Spatial Combinations of Visual Features"
        },
        {
            "paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662",
            "title": "Statistical Learning of Visual Feature Hierarchies"
        }
    ]
}

You will want a table for:

Authors - id (auto increment pk), authorId (varchar), url (varchar),
Papers - id (auto increment pk), authorId (varchar fk), paperId, title (varchar)

Modify this as you please:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

mycursor = mydb.cursor()

# create the tables if they don't exist
authors = False
papers = False
mycursor.execute("SHOW TABLES")

for e in mycursor.fetchall():
    if "authors" == e[0]:
        authors = True
    if "papers" == e[0]:
        papers = True

if not authors:
    mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))")

if not papers:
    mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))")

# insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry.
a = data["authorId"]
u = data["url"]
ps = data["papers"]

# I am going to check if the authorId already exists.
mycursor.execute("SELECT * FROM authors WHERE authorId = '"   a   "'")
as = mycursor.fetchall()
if len(as) == 0:
    sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)"
    val = (a, u)
    mycursor.execute(sql, val)
    mydb.commit()

# for each paper in papers
for p in ps:
    mycursor.execute("SELECT * FROM papers WHERE authorId = '"   a   "' AND paperId = '"   p["paperId"]   "' AND title = '"   p["title"]   "'")
    pc = mycursor.fetchall()
    if len(pc) == 0:
        sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)"
        val = (a, p["paperId"], p["title"])
        mycursor.execute(sql, val)
        mydb.commit()

mydb.close()

Side note:

I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.