Home > Mobile >  How to add each element (sentence) of a list to a pandas column?
How to add each element (sentence) of a list to a pandas column?

Time:11-19

I am extracting information about chemical elements from Wikipedia. It contains sentences, and I want each sentence to be added as follows:

Molecule Sentence1 Sentence1 and sentence2 All_sentence
MgO this is s1. this is s1. this is s2. all_sentence
CaO this is s1. this is s1. this is s2. all_sentence

What I've achieved so far

import spacy
import pandas as pd
import wikipediaapi
import csv


wiki_wiki = wikipediaapi.Wikipedia('en')
chemical = input("Write the name of molecule: ")

page_py = wiki_wiki.page(chemical)
sumary = page_py.summary[0:]

nlp = spacy.load('en_core_web_sm')

text_sentences = nlp(sumary)
sent_list = []
for sentence in text_sentences.sents:
    sent_list.append(sentence.text)


#print(sent_list)


df = pd.DataFrame(
   {'Molecule': chemical,
     'Description': sent_list})
print(df.head())

The output looks like:

Molecule Description
MgO All sentences are here
Mgo

The Molecule columns are shown repeatedly for each line of sentence which is not correct. Please suggest some solution

CodePudding user response:

It's not clear why you would want to repeat all sentences in each column but you can get to the form you want with pivot:

import spacy
import pandas as pd
import wikipediaapi
import csv


wiki_wiki = wikipediaapi.Wikipedia('en')
chemical = input("Write the name of molecule: ")

page_py = wiki_wiki.page(chemical)
sumary = page_py.summary[0:]

nlp = spacy.load('en_core_web_sm')

sent_list = [sent.text for sent in nlp(sumary).sents]
#cumul_sent_list = [' '.join(sent_list[:i]) for i in range(1, len(sent_list) 1)] # uncomment to cumulate sentences in columns

df = pd.DataFrame(
   {'Molecule': chemical,
     'Description': sent_list}) # replace sent_list with cumul_sent_list if cumulating
df["Sentences"] = pd.Series([f"Sentence{i   1}" for i in range(len(df))]) # replace "Sentence{i 1}" with "Sentence1-{i 1}" if cumulating
df = df.pivot(index="Molecule", columns="Sentences", values="Description")
print(df)

sent_list can be created using a list comprehension. Create cumul_sent_list if you want your sentences to be repeated in columns.

Output:

Sentences                                          Sentence1  ...                                          Sentence9
Molecule                                                      ...                                                   
MgO        Magnesium oxide (MgO), or magnesia, is a white...  ...  According to evolutionary crystal structure pr...
  • Related