How to use "for loop" in Python to extract year and firm name (for earning call transcript-CodePudding

I have a txt file of this type:

Thomson Reuters StreetEvents Event Transcript
E D I T E D   V E R S I O N

Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT

================================================================================
Corporate Participants
================================================================================

My txt file is saved:C:\sam\2003-Sep-10-ABM.N-140985434256-Transcript.txt.

I want to extract only transcript year (as 2003) and firm name (as ABM Industries). I used below codes, but ended up with all years.

Code:

import re
f = open("C:\\sam\\2003-Sep-10-ABM.N-140985434256-Transcript.txt", 'r')
content = f.read()
pattern = "\d{4}"
years = re.findall(pattern, content)
for year in years:
    print(year)

My Output: 2003 2003 2003 2003 2002 2003 2002 2003 2003 2002 2003 2002 2002 2003 2002 2002 2002 2002 2002 2003 2003 2003 2004 2003 2003 2003 2004 2019

Expected Output: 2003 ABM Industries

CodePudding user response：

If I understand you correctly, this should work:

import re 
content = """Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT"""
pattern = "\d{4} \s\w \s\w "
years = re.findall(pattern, content)[0]
print(years)

Output: "2003 ABM Industries"