Home > database >  How to get rid of the rest of the text after getting the results I want?
How to get rid of the rest of the text after getting the results I want?

Time:11-12

import urllib.request
import json
from collections import Counter

def count_coauthors(author_id):
    coauthors_dict = {}

    url_str = ('https://api.semanticscholar.org/graph/v1/author/47490276?fields=name,papers.authors')
    respons = urllib.request.urlopen(url_str)
    text = respons.read().decode()

    for line in respons:
        print(line.decode().rstip())

    data = json.loads(text)
    print(type(data))
    print(list(data.keys()))
    print(data["name"])
    print(data["authorId"])

    name = []
    for lines in data["papers"]:
        for authors in lines["authors"]:
            name.append(authors.get("name")) 
        print(name)

    count = dict()
    names = name
    for i in names:
        if i not in count:
            count[i] = 1
        else:
            count[i]  = 1
    print(count) 

    c = Counter(count)
    top = c.most_common(10)
    print(top)

    return coauthors_dict

author_id = '47490276'
cc = count_coauthors(author_id)

top_coauthors = sorted(cc.items(), key=lambda item: item[1], reverse=True)
for co_author in top_coauthors[:10]:
    print(co_author)

This is how my code looks this far, there are no error. I need to get rid of the rest of the text when I run it, so it should look like this:

('Diego Calvanese', 47)
('D. Lanti', 28)
('Martín Rezk', 21)
('Elem Güzel Kalayci', 18)
('B. Cogrel', 17)
('E. Botoeva', 16)
('E. Kharlamov', 16)
('I. Horrocks', 12)
('S. Brandt', 11)
('V. Ryzhikov', 11)

I have tried using rstrip and split on my 'c' variable but it doesn't work. Im only allowed importing what I already have imported and must use the link which is included.

Tips on simplifying or bettering the code is also appreciated!

("Extend the program below so that it prints the names of the top-10 coauthors together with the numbers of the coauthored publications")

CodePudding user response:

From what I understand you are not quite sure where your successful output originates from. It is not the 5 lines at the end.

Your result is printed by the print(top) on line 39. This top variable is what you want to return from the function, as the coauthors_dict you are currently returning never actually gets any data written to it.

You will also have to slightly adjust your sorted(...) as you now have a list and not a dictionary, but you should then get the correct result.

CodePudding user response:

If I understand correctly you are wanting this function to return a count of each distinct co-author (excluding the author), which it seems like you already have in your count variable, which you don't return. The variable you DO return is empty.

Instead consider:

import urllib.request
import json
from collections import Counter

def count_coauthors(author_id):
    url_str = (f'https://api.semanticscholar.org/graph/v1/author/{author_id}?fields=name,papers.authors')
    response = urllib.request.urlopen(url_str)
    text = response.read().decode()
    data = json.loads(text)
   
    names = [a.get("name") for l in data["papers"] for a in l["authors"] if a['authorId'] != author_id]
    
    #The statement above can be written long-hand like:
    #names=[]
    #for l in data["papers"]:
    #    for a in l["authors"]:
    #        if a['authorId'] != author_id:
    #            names.append(a.get("name"))

    return list(Counter(names).items())

author_id = '47490276'
cc = count_coauthors(author_id)

top_coauthors = sorted(cc, key=lambda item: item[1], reverse=True)
for co_author in top_coauthors[:10]:
    print(co_author)

('Diego Calvanese', 47)
('D. Lanti', 28)
('Martín Rezk', 21)
('Elem Güzel Kalayci', 18)
('B. Cogrel', 17)
('E. Botoeva', 16)
('E. Kharlamov', 16)
('I. Horrocks', 12)
('S. Brandt', 11)
('V. Ryzhikov', 11)

You might also consider moving the top N logic into the function as an optional paramter:

import urllib.request
import json
from collections import Counter

def count_coauthors(author_id, top=0):
    url_str = (f'https://api.semanticscholar.org/graph/v1/author/{author_id}?fields=name,papers.authors')
    response = urllib.request.urlopen(url_str)
    text = response.read().decode()
    data = json.loads(text)

    names = [a.get("name") for l in data["papers"] for a in l["authors"] if a['authorId'] != author_id]    
    name_count = list(Counter(names).items())

    top = top if top!=0 else len(name_count)
    return sorted(name_count, key=lambda x: x[1], reverse=True)[:top]

author_id = '47490276'
for auth in count_coauthors(author_id, top=10): 
        print(auth) 
  • Related