Home > OS >  Capitalise the first letter of multiple sentences, lower-case all else
Capitalise the first letter of multiple sentences, lower-case all else

Time:12-02

Update: I am interested in multiple sentences in one string.

I have been following this handy tutorial, that offers variations of my requirements.

How can I capitalise just the first letter of multiple sentences?

Sentence being either of the three: . ! ?.


Code:

PDF, pg 3

from io import StringIO

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

def convert_pdf_to_string(file_path):
    output_string = StringIO()
    with open(file_path, 'rb') as in_file:
        parser = PDFParser(in_file)
        doc = PDFDocument(parser)
        rsrcmgr = PDFResourceManager()
        device = TextConverter(rsrcmgr, output_string, laparams=LAParams())
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        for page in PDFPage.create_pages(doc):
            interpreter.process_page(page)

    return(output_string.getvalue())

text = convert_pdf_to_string('GPIC_Sustainability_Report_2016-v9_(lr).pdf')
print(text)

text:

In 2012, Gulf Petrochemical InDuStRiEs Company becomes part of \nthe global transformation for a sustainable future by committing to \nthe United Nations Global Compact’s ten principles in the realms \nof Human Rights, Labour, Environment and Anti-Corruption. \n\nGPIC becomes an organizational stakeholder of Global Reporting \nInitiative ( GRI) in 2014.

Desired Text:

In 2012, Gulf Petrochemical Industries Company becomes part of the global transformation for a sustainable future by committing to the United Nations Global Compact’s ten principles in the realms of Human Rights, Labour, Environment and Anti-Corruption. GPIC becomes an organizational stakeholder of Global Reporting Initiative ( GRI) in 2014.

Update Code: Can be added anywhere

text = text.replace('\n', '')
text = text.replace('\x0c', '')

Please let me know if I should clarify anything else.

CodePudding user response:

s = 'This is An ExAmplE senTENCE.'
s.capitalize()
>> 'This is an example sentence.'

Try this:

from nltk import tokenize
paragraph = "Hello there. How are you?"
sentences = tokenize.sent_tokenize(p)
capitalized = [s.capitalize() for s in sentences]
new_paragraph = ''.join(capitalized)

CodePudding user response:

'.'.join([i.capitalize() for i in s.split('.')])

for many sentences ^

  • Related