Home > Software engineering >  Python Regex to find stock tickers (stock symbols)
Python Regex to find stock tickers (stock symbols)

Time:12-04

I am trying to create a regex that finds ticker symbols in bodies of text. However it is a bit of a struggle to get one to do everything I need.

Example:

This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.

So in this case I would like to match the following from the above:

  • test
  • LIKE
  • YOU
  • FIND
  • STOck

I am trying to get:

  • any word after a "$" sign (not including the $), case insensitive
  • any word that is ALL CAPS and between 3-6 characters long

I've tried:

  • \b[A-Z]{3,6}\b but that matches pretty much every word
  • \$[^3-6\s]\S* but that includes the $ and also ignores any ALL CAPS without a dollar sign

CodePudding user response:

Would you please try the following:

import re

s = 'This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.'

print(re.findall(r'(?<=\$)\w |[A-Z]{3,6}', s))

Output:

['test', 'LIKE', 'YOU', 'FIND', 'STOck']

(?<=\$) is a lookbehind assertion which matches a leading dollar sign without including the match in the result.
(Precisely speaking, it matches the boundary just after the dollar sign rather than the character itself.)

  • Related