I have a txt file of this type:
Thomson Reuters StreetEvents Event Transcript
E D I T E D V E R S I O N
Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT
================================================================================
Corporate Participants
================================================================================
My txt file is saved:C:\sam\2003-Sep-10-ABM.N-140985434256-Transcript.txt.
I want to extract only transcript year (as 2003) and firm name (as ABM Industries). I used below codes, but ended up with all years.
Code:
import re
f = open("C:\\sam\\2003-Sep-10-ABM.N-140985434256-Transcript.txt", 'r')
content = f.read()
pattern = "\d{4}"
years = re.findall(pattern, content)
for year in years:
print(year)
My Output: 2003 2003 2003 2003 2002 2003 2002 2003 2003 2002 2003 2002 2002 2003 2002 2002 2002 2002 2002 2003 2003 2003 2004 2003 2003 2003 2004 2019
Expected Output: 2003 ABM Industries
CodePudding user response:
If I understand you correctly, this should work:
import re
content = """Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT"""
pattern = "\d{4} \s\w \s\w "
years = re.findall(pattern, content)[0]
print(years)
Output: "2003 ABM Industries"