How to add each element (sentence) of a list to a pandas column?-CodePudding

I am extracting information about chemical elements from Wikipedia. It contains sentences, and I want each sentence to be added as follows:

Molecule	Sentence1	Sentence1 and sentence2	All_sentence
MgO	this is s1.	this is s1. this is s2.	all_sentence
CaO	this is s1.	this is s1. this is s2.	all_sentence

What I've achieved so far

import spacy
import pandas as pd
import wikipediaapi
import csv


wiki_wiki = wikipediaapi.Wikipedia('en')
chemical = input("Write the name of molecule: ")

page_py = wiki_wiki.page(chemical)
sumary = page_py.summary[0:]

nlp = spacy.load('en_core_web_sm')

text_sentences = nlp(sumary)
sent_list = []
for sentence in text_sentences.sents:
    sent_list.append(sentence.text)


#print(sent_list)


df = pd.DataFrame(
   {'Molecule': chemical,
     'Description': sent_list})
print(df.head())

The output looks like:

Molecule	Description
MgO	All sentences are here
Mgo

The Molecule columns are shown repeatedly for each line of sentence which is not correct. Please suggest some solution

CodePudding user response：

It's not clear why you would want to repeat all sentences in each column but you can get to the form you want with pivot:

import spacy
import pandas as pd
import wikipediaapi
import csv


wiki_wiki = wikipediaapi.Wikipedia('en')
chemical = input("Write the name of molecule: ")

page_py = wiki_wiki.page(chemical)
sumary = page_py.summary[0:]

nlp = spacy.load('en_core_web_sm')

sent_list = [sent.text for sent in nlp(sumary).sents]
#cumul_sent_list = [' '.join(sent_list[:i]) for i in range(1, len(sent_list) 1)] # uncomment to cumulate sentences in columns

df = pd.DataFrame(
   {'Molecule': chemical,
     'Description': sent_list}) # replace sent_list with cumul_sent_list if cumulating
df["Sentences"] = pd.Series([f"Sentence{i   1}" for i in range(len(df))]) # replace "Sentence{i 1}" with "Sentence1-{i 1}" if cumulating
df = df.pivot(index="Molecule", columns="Sentences", values="Description")
print(df)

sent_list can be created using a list comprehension. Create cumul_sent_list if you want your sentences to be repeated in columns.

Output:

Sentences                                          Sentence1  ...                                          Sentence9
Molecule                                                      ...                                                   
MgO        Magnesium oxide (MgO), or magnesia, is a white...  ...  According to evolutionary crystal structure pr...