so i am making an WPF application where you insert PDF files and it will convert to text, after that a few Regex functions will be used on the text to give me only the important parts of the pdf.
the first problem i am running into is with numbers, if the number for example is 6.90 it will come out as 6.9. I have tried changing my Regex but it wont make a difference.
the second problem i have is when with dates for example 09-06-2022 it just wont write anything i have also tried changing the Regex but it just wont show up.
anyone know why this is ?
this is a line in the PDF i use i am trying to only get 6.90
Date: 06-09-2022 € 5.70 € 1.20 € 6.90
this is the Regex is use to only get the Amount
(?<=Date\:?\s?\s?\s?\d{0,2}\-\d{0,2}\-\d{0,4}\s?\€\s\d{0,10}\.?\,?\d{0,2}\s?\€\s\d{0,10}\,?\.?\d{0,10}\s?\€\s)\d{0,10}\.\d{0,2}
this is the Regex i use to only get the Date
(?<=Date\:?\s?\s?\s?)\d{0,2}\-\d{0,2}\-\d{0,4}
There are a lot of "?" in it because i have to make it compatible to multiple different PDF
P.S. i didn't know how to ask something like this i am new to Stackoverflow
CodePudding user response:
This is much easier without Regex
string input = "Date: 06-09-2022 € 5.70 € 1.20 € 6.90";
string[] array = input.Split(new char[] {':', '€'});
DateTime date = DateTime.Parse(array[1]);
decimal amount1 = decimal.Parse(array[2]);
decimal amount2 = decimal.Parse(array[3]);
decimal amount3 = decimal.Parse(array[4]);
CodePudding user response:
If you still want to use Regex, this is a much simpler solution
Date\:\s{0,}(\d{1,2}-?\d{1,2}-?\d{2,4}). (\d \.\d ). (\d \.\d ). (\d \.\d )
Breakdown
Date\:\s{0,}
matches Date: followed by 0 or more spaces
(\d{1,2}-?\d{1,2}-?\d{2,4})
matches your date string accepting 1 or 2 numbers for month and day and 2 to 4 for year
. (\d \.\d )
matches any characters until it matches 1 or more numbers followed by . and 1 or more numbers. This is repeated 3 times to obtain the currency values