Could anyone have a look at this regex?
PATTERN=r"([A-Z]{3}[\/I\\\s\-\|]?[A-Z]{3})\s*(BUY|SELL) \s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?)?\s*((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?)?\s*(Intra-Day Trade|SWING TRADE)?\s*"gm
Is there any way I can match multiple groups as SL or TP without having to rewrite the whole
((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?)?
pattern for the group?
I'm trying to get the SL and TP groups (in this case): example:
import re
PATTERN=r"([A-Z]{3}[\/I\\\s\-\|]?[A-Z]{3})\s*(BUY|SELL) \s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?)?\s*((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?)?\s*(Intra-Day Trade|SWING TRADE)?\s*"
text="""EURCAD SELL 1.36599 SL 1.37701 (110 Pips) TP 1.34017 (258 Pips) Intra-Day Trade """
match = re.match(PATTERN, text)
groups = list(match.groups())
for idx, x in enumerate(groups):
print(f"Group number {idx}, content: {x}")
Output:
Group number 0, content: EURCAD
Group number 1, content: SELL
Group number 2, content: 1.36599
Group number 3, content: SL 1.37701 (110 Pips)
Group number 4, content: SL
Group number 5, content: 1.37701
Group number 6, content: (
Group number 7, content: 110 Pips
Group number 8, content: 110
Group number 9, content: TP 1.34017 (258 Pips)
Group number 10, content: TP
Group number 11, content: 1.34017
Group number 12, content: (
Group number 13, content: 258 Pips
Group number 14, content: 258
Group number 15, content: Intra-Day Trade
Non repeated code for group:
import re
PATTERN=r"([A-Z]{3}[\/I\\\s\-\|]?[A-Z]{3})\s*(BUY|SELL) \s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*((SL|TP)?\s*(-\d \.\d |\d \.\d \ |\d \.\d )?\s*(?:(\(?))((-\d |\d \ |\d )?\s*Pips)?(?:\))?) \s*\s*(Intra-Day Trade|SWING TRADE)?\s*"
text="""EURCAD SELL 1.36599 SL 1.37701 (110 Pips) TP 1.34017 (258 Pips) Intra-Day Trade """
match = re.match(PATTERN, text)
groups = list(match.groups())
for idx, x in enumerate(groups):
print(f"Group number {idx}, content: {x}")
Output non repeated code:
Group number 0, content: EURCAD
Group number 1, content: SELL
Group number 2, content: 1.36599
Group number 3, content:
Group number 4, content: TP
Group number 5, content: 1.34017
Group number 6, content:
Group number 7, content: 258 Pips
Group number 8, content: 258
Group number 9, content: Intra-Day Trade
but I can't do it without having them distributed between two different groups. Example of non different groups regex101 link. Which obviously gives me:
I have to match:
EURCAD SELL 1.36599
SL 1.37701 (110 Pips)
TP 1.34017 (258 Pips)
Intra-Day Trade
but also (-):
EURCAD SELL 1.36599
SL 1.37701 (110 Pips)
Intra-Day Trade
(-)
EURCAD SELL 1.36599
TP 1.34017 (258 Pips)
Intra-Day Trade
(-)
EURCAD SELL 1.36599
(-)
EURCAD SELL 1.36599
SL 1.37701 (110 Pips)
TP 1.34017 (258 Pips)
(-)
EURCAD SELL 1.36599
SL 1.37701 (110 Pips)
TP 1.34017 (258 Pips)
SWING TRADE
(-)
EUR/CAD BUY 1.36599
SWING TRADE
(-)
EUR|CAD BUY 1.36599
SWING TRADE
(-)
EUR\CAD BUY 1.36599
SWING TRADE
(-)
EUR|CAD BUY 1.36599
SWING TRADE
(-)
EURICAD BUY 1.36599
Intra-Day Trade
Basically these are human inputted strings, and I have to possibly provide all the matches which could lead to a correct "trading" signal.
Obviously by reading the regex you can get what I'm trying to match, but also I wasn't really seeking help for the regex pattern but just for the GROUPS syntax (which didn't lead to a correct match without having repeated code).
This is the regex101 link for the pattern to test it out by yourself. Thank you in advance.
CodePudding user response:
If you are repeating capture groups, the capture group has the value of the last iteration.
The pattern that you tried, can be shortened to:
([A-Z]{3}[\/I\\\s|-]?[A-Z]{3})\s*(BUY|SELL) \s*(-?\d \.\d |\d \.\d \ )?(?:\n(SL|TP)\s*(-?\d \.\d |\d \.\d \ )\s*(\((-?\d |\d \ )?\s*Pips\)))*\s*(Intra-Day Trade|SWING TRADE)?
Some notes about the shortened pattern:
- This part
(?:(\(?))
can be written as\(?
- This part
(-\d \.\d |\d \.\d \ |\d \.\d )
can be written as(-?\d \.\d |\d \.\d \ )
- The parenthesis in the Pips part like
(110 Pips)
are not optional - For the non repeated pattern, you repeated the non capture group 1 or more times with all optional parts. Instead, you can optionally repeat the non capture group with the fixed format so that all the inner parts are not optional
If you want to repeat the capture groups and have those values, and you can use the PyPi regex module, you can use the capturesdict()
and use named capture groups
The pattern with the named capture groups:
(?P<word1>[A-Z]{3}[\/I\\\s|-]?[A-Z]{3})\s*(?P<word2>BUY|SELL) \s*(?P<word3>-?\d \.\d |\d \.\d \ )?(?:\n(?P<word4>SL|TP)\s*(?P<word5>-?\d \.\d |\d \.\d \ )\s*(?P<word6>\((-?\d |\d \ )?\s*Pips\)))*\s*(?P<wor7>Intra-Day Trade|SWING TRADE)?
For example
import regex
pattern = r"(?P<word1>[A-Z]{3}[\/I\\\s|-]?[A-Z]{3})\s*(?P<word2>BUY|SELL) \s*(?P<word3>-?\d \.\d |\d \.\d \ )?(?:\n(?P<word4>SL|TP)\s*(?P<word5>-?\d \.\d |\d \.\d \ )\s*(?P<word6>\((-?\d |\d \ )?\s*Pips\)))*\s*(?P<wor7>Intra-Day Trade|SWING TRADE)?"
s = ("EURCAD SELL 1.36599\n"
"SL 1.37701 (110 Pips)\n"
"TP 1.34017 (258 Pips)\n\n"
"Intra-Day Trade\n\n"
"EUR/CAD BUY 1.36599\n"
"SWING TRADE")
matches = regex.finditer(pattern, s, regex.M)
for matchNum, match in enumerate(matches, start=1):
print(match.capturesdict())
Output
{'word1': ['EURCAD'], 'word2': ['SELL'], 'word3': ['1.36599'], 'word4': ['SL', 'TP'], 'word5': ['1.37701', '1.34017'], 'word6': ['(110 Pips)', '(258 Pips)'], 'wor7': ['Intra-Day Trade']}
{'word1': ['EUR/CAD'], 'word2': ['BUY'], 'word3': ['1.36599'], 'word4': [], 'word5': [], 'word6': [], 'wor7': ['SWING TRADE']}