I've written the code below to get some citation data from an API and write it to a CSV. It works fine except that one of the columns returns a list of authors and it comes into the CSV like this:

[{'authorId': '83129125', 'name': 'June A. Sekera'}, {'authorId': '13328115', 'name': 'A. Lichtenberger'}]

How can I parse this so I get simply a comma-separated list of the authors in a single cell, ignoring the authorId?

import requests
import json
import pandas as pd

# get data from the API
r = requests.get("https://api.semanticscholar.org/graph/v1/paper/b5bb17a53f75b48ab5e18c00fb048b783db6b1f4/citations?fields=title,authors,year,url")
json = r.json()
df = pd.DataFrame(json['data'])
# new df from the column of lists
split_df = pd.DataFrame(df['citingPaper'].tolist())
# display the resulting df
print(split_df)
split_df.to_csv("citations.csv", index=False)

CodePudding user response：

Something like the below (do the authors "cleanup" before we populate the df)

import requests
import pandas as pd

r = requests.get("https://api.semanticscholar.org/graph/v1/paper/b5bb17a53f75b48ab5e18c00fb048b783db6b1f4/citations?fields=title,authors,year,url")
data = []
if r.status_code == 200:
  entries = r.json()['data']
  for entry in entries:
    entry['citingPaper']['authors'] = ','.join(x['name'] for x in entry['citingPaper'].get('authors',[]))
    data.append(entry['citingPaper'])
  df = pd.DataFrame(data)
  df.to_csv("citations.csv",index = False)

citations.csv

paperId,url,title,year,authors
1ebccd3d83ed2fd79bc57cf0d06a8e02ba16180f,https://www.semanticscholar.org/paper/1ebccd3d83ed2fd79bc57cf0d06a8e02ba16180f,"A comparative study on deformation mechanisms, microstructures and mechanical properties of wide thin-ribbed sections formed by sideways and forward extrusion",2021,"Wenbin Zhou,Junquan Yu,Xiaona Lu,Jianguo Lin,T. Dean"
46f209486a9e8f81c77dbbb39991f4045dbc8f7d,https://www.semanticscholar.org/paper/46f209486a9e8f81c77dbbb39991f4045dbc8f7d,The low-carbon steel industry-Interactions between the hydrogen direct reduction of steel and the electricity system,2021,A. Toktarova
5638200cc8b188a48f923b75dee9793c06c99b62,https://www.semanticscholar.org/paper/5638200cc8b188a48f923b75dee9793c06c99b62,Pore-scale assessment of subsurface carbon storage potential: implications for the UK Geoenergy Observatories project,2021,"R. Payton,M. Fellgett,B. Clark,D. Chiarella,A. Kingdon,S. Hier‐Majumder"
6055ea75b377b6776f468ccb9f21551614b5f61f,https://www.semanticscholar.org/paper/6055ea75b377b6776f468ccb9f21551614b5f61f,Can Nature-Based Solutions Deliver a Win-Win for Biodiversity and Climate Change Adaptation?,2021,"Isabel Key,Alison C. Smith,B. Turner,A. Chausson,C. Girardin,Megan MacGillivray,N. Seddon"
8734d34823bcfb362f05df2a48bad19cc026b1c1,https://www.semanticscholar.org/paper/8734d34823bcfb362f05df2a48bad19cc026b1c1,Trends in air travel inequality in the UK: From the few to the many?,2021,"M. Büchs,Giulio Mattioli"
020eecb5f6edf918b6ef1120d97276b8d0748dc7,https://www.semanticscholar.org/paper/020eecb5f6edf918b6ef1120d97276b8d0748dc7,"Decarbonising the critical sectors of aviation, shipping, road freight and industry to limit warming to 1.5–2°C",2020,"M. Sharmina,O. Edelenbosch,C. Wilson,R. Freeman,D. Gernaat,P. Gilbert,A. Larkin,E. Littleton,M. Traut,D. V. van Vuuren,N. Vaughan,F. R. Wood,C. Le Quéré"
1e77bf66cfe8f94463c73289e4940d0efcc2a5e4,https://www.semanticscholar.org/paper/1e77bf66cfe8f94463c73289e4940d0efcc2a5e4,Investments in climate-friendly materials to strengthen the recovery package JUNE 2020,2020,"F. Lettow,Olga Chiappinelli"
3916ee1df6b8e07f8798a90726407554e990847e,https://www.semanticscholar.org/paper/3916ee1df6b8e07f8798a90726407554e990847e,Pathways for Low-Carbon Transition of the Steel Industry—A Swedish Case Study,2020,"A. Toktarova,I. Karlsson,Johan Rootzén,L. Göransson,M. Odenberger,F. Johnsson"
55667e4e2e3c35d7c6fcf98021a083d4397a308d,https://www.semanticscholar.org/paper/55667e4e2e3c35d7c6fcf98021a083d4397a308d,Potentials for reducing climate impact from tourism transport behavior,2020,"Anneli Kamb,E. Lundberg,J. Larsson,Jonas Nilsson"
9fc21836d61fbc76e3923021cb88f94e3e8c5a41,https://www.semanticscholar.org/paper/9fc21836d61fbc76e3923021cb88f94e3e8c5a41,Decarbonization of construction supply chains-Achieving net-zero carbon emissions in the supply chains linked to the construction of buildings and transport infrastructure,2020,
b657008ea89807ccbb204ed1e2f4debe92ce252b,https://www.semanticscholar.org/paper/b657008ea89807ccbb204ed1e2f4debe92ce252b,Roadmap for Decarbonization of the Building and Construction Industry—A Supply Chain Analysis Including Primary Production of Steel and Cement,2020,"I. Karlsson,Johan Rootzén,A. Toktarova,M. Odenberger,F. Johnsson,L. Göransson"
c611929ef750ba845a9c87da862ad1f8c9711e64,https://www.semanticscholar.org/paper/c611929ef750ba845a9c87da862ad1f8c9711e64,"Assessing Carbon Capture: Public Policy, Science, and Societal Need",2020,"June A. Sekera,A. Lichtenberger"

CodePudding user response：

The easiest way I can think of is to use apply(lambda x: ...), creating a list of values for dictionary key "name" in each dictionary p in each item of the column authors.

Add this underneath split_df = pd.DataFrame(...):

split_df["authors"] = split_df["authors"].apply(lambda x: [p["name"] for p in x])

split_df["authors"][0]
#Out: ['Wenbin Zhou', 'Junquan Yu', 'Xiaona Lu', 'Jianguo Lin', 'T. Dean']

Edit

To have blank "" if there are no authors:

split_df["authors"] = split_df["authors"].apply(lambda x: [p["name"] for p in x] if len(x) > 0 else "")