Home > Mobile >  How to use "for loop" in Python to extract year and firm name (for earning call transcript
How to use "for loop" in Python to extract year and firm name (for earning call transcript

Time:01-30

I have a txt file of this type:

Thomson Reuters StreetEvents Event Transcript
E D I T E D   V E R S I O N

Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT

================================================================================
Corporate Participants
================================================================================

My txt file is saved:C:\sam\2003-Sep-10-ABM.N-140985434256-Transcript.txt.

I want to extract only transcript year (as 2003) and firm name (as ABM Industries). I used below codes, but ended up with all years.

Code:

import re
f = open("C:\\sam\\2003-Sep-10-ABM.N-140985434256-Transcript.txt", 'r')
content = f.read()
pattern = "\d{4}"
years = re.findall(pattern, content)
for year in years:
    print(year)

My Output: 2003 2003 2003 2003 2002 2003 2002 2003 2003 2002 2003 2002 2002 2003 2002 2002 2002 2002 2002 2003 2003 2003 2004 2003 2003 2003 2004 2019

Expected Output: 2003 ABM Industries

CodePudding user response:

If I understand you correctly, this should work:

import re 
content = """Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT"""
pattern = "\d{4} \s\w \s\w "
years = re.findall(pattern, content)[0]
print(years)

Output: "2003 ABM Industries"

  • Related