I have potential strings
like this. The fist few characters
are a symbol that can be one to a few letters, but could contain weird characters like "/". Then the next six characters
are always a date
, YYMMDD where YY,MM,DD are always integers
, but are always padded to the left with a 0
as shown. This is followed by a single character that is always 'C' or 'P', then finally a double
.
AAPL220819C152.5
AAPL220819P195
AAPL220902P187.5
AAPL220819C155
AAPL220930C180
What is a regular expression
that parses
these strings into its constituent parts,
Symbol,
Date,
COP,
Strike
fast?
So the expected output would be:
"AAPL220819C152.5" {Symbol = "AAPL", Date = 2022-08-19, COP = "C", Strike = 152.5 }
"AAPL220819P195" {Symbol = "AAPL", Date = 2022-08-19, COP = "P", Strike = 195.0}
I have seen similar posts here but I don't understand enough to modify it.
CodePudding user response:
Try this:
static void Main(string[] args)
{
TestParsingRegex("AAPL220819C152.5", "AAPL220819P195", "AAPL220902P187.5", "AAPL220819C155", "AAPL220930C180");
}
private static void TestParsingRegex(params string[] strings)
{
var regex = new Regex(@"([A-Z] )(\d{6})([CP]{1})(.*)");
foreach (var s in strings)
{
var match = regex.Match(s);
foreach (var g in match.Groups)
{
Console.WriteLine(g);
}
}
}
it should have the following output:
AAPL220819C152.5
AAPL
220819
C
152.5
AAPL220819P195
AAPL
220819
P
195
AAPL220902P187.5
AAPL
220902
P
187.5
AAPL220819C155
AAPL
220819
C
155
AAPL220930C180
AAPL
220930
C
180
Notice that the first group is the entire string
This regex uses groups to get the desired parsing like so:
([A-Z] )
all upper case letters up to the next group
(\d{6})
exactly six digits
([CP]{1})
exactly one C
or P
character
(.*)
everything else