I have a string as follows:
paragraph = 'Below you’ll find KPIs (key performance indicators) and valuation metrics for 50 public SaaS and cloud companies. This includes historical share price performance and valuation multiples, an interactive regression chart, efficiency metrics (magic number, payback period, ARR / FTE, etc.), average ACV (annual contract value), and financial metrics including ARR, OpEx margins and cash flow margins. These metrics can be filtered by year-over-year ARR growth rates (filter located under the Valuation Metrics section header). Share prices and financial data are updated as of 06-May-2022 and will continue to be updated frequently.'
I am trying to write a function to retrieve the date as a string as '06-May-2022'
def get_date(inputString):
# this will require a list with two elements, both integers
boolean_list = [char.isdigit() for char in inputString]
all_indexes = [i for i, x in enumerate(boolean_list) if x]
all_indexes = all_indexes[2:]
indexes = [all_indexes[0],all_indexes[-1]]
index_one = int(indexes[0])
index_two = int(indexes[1])
date = inputString[index_one,index_two]
return date
get_date(paragraph)
But when I run it, I get the error saying "TypeError: string indices must be integers"
When I run this:
type(indexes[0])
it returns "int" so I do not understand the error. Any help would be greatly appreciated. Thanks!
CodePudding user response:
This doesn't answer your question directly, but if you're looking to find the 'date' in a given string, you might want to look at Regular Expressions.
In Python, you might do something like...
import re
paragraph = """Below you’ll find KPIs (key performance indicators) and valuation metrics for 50 public SaaS and cloud companies. This includes historical share price performance and valuation multiples, an interactive regression chart, efficiency metrics (magic number, payback period, ARR / FTE, etc.), average ACV (annual contract value), and financial metrics including ARR, OpEx margins and cash flow margins. These metrics can be filtered by year-over-year ARR growth rates (filter located under the Valuation Metrics section header). Share prices and financial data are updated as of 06-May-2022 and will continue to be updated frequently."""
exp = re.compile(r"[0-9][0-9]-. -[0-9][0-9][0-9][0-9]")
dates = exp.search(paragraph)
if dates:
date = dates[0]
print(str(date))
That snippet would print '06-May-2022'
.
You can see more about the expression that matched that string on Regex101.
In Software Engineering, there's a concept known as "don't reinvent the wheel": in other words, use existing technologies such as Regular Expressions rather than trying to design complex functions to parse out date strings. More so, there's probably packages you can find that would extract dates from a string without having to even use Regular Expressions yourself.
CodePudding user response:
You can use spacy module in Python's natural language processing:
import spacy
# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
doc = nlp(paragraph)
# Find named entities, phrases and concepts
for entity in doc.ents:
if entity.label_== 'DATE' and str(entity)[0].isdigit(): #second condition isto select only the dates with integer values
print(entity)
#output
'06-May-2022'