I try to show the contents of a dictionary that has to return this output:
'Watermeloenen': 466, 'Appels': 688, 'Sinaasappels': 803
So I have this method:
def total_fruit_per_sort(self, file_content):
file_contents = self.extractingText.extract_text_from_image(
file_content)
number_found = re.findall(self.total_amount_fruit_regex(), file_contents)
fruit_dict = {}
for n, f in number_found:
fruit_dict[f] = fruit_dict.get(f, 0) int(n)
return str({value: key for value, key in fruit_dict.items()}).replace("{", "").replace("}", "")
This is the regex:
def total_amount_fruit_regex(self):
return r"(\d*(?:\.\d )*)\s*W ({self.fruit_list()})"
and the input string(file_contents) is this:
"[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala 13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, 31 (0}1 80 61 88 11, Fax 31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']"
And this is the fruit_list:
self.list_fruit = ['Appels', 'Ananas', 'Peen Waspeen',
'Tomaten Cherry', 'Sinaasappels',
'Watermeloenen', 'Rettich', 'Peren', 'Peen',
'Mandarijnen', 'Meloenen', 'Grapefruit', 'Rettich']
But if I run the function: total_fruit_per_sort. I get this error:
expected string or bytes-like object
Request Method: POST
Request URL: http://127.0.0.1:8000/
Django Version: 4.1.1
Exception Type: TypeError
Exception Value:
expected string or bytes-like object
Exception Location: C:\Python310\lib\re.py, line 240, in findall
Raised during: main.views.ReadingFile
Python Executable: C:\Python310\python.exe
But I parse the dictionary already to a string.
So don't know how to tackle this.
This line in the stracktrace it complains:
number_found = re.findall(
self.total_amount_fruit_regex(), file_contents)
This is the output of print(file_contents):
[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala
13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt
gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, 31 (0}1 80 61 88 11, Fax 31 (0)1 8061
88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']
CodePudding user response:
Check that the result of
file_contents = self.extractingText.extract_text_from_image(file_content)
is actually a string or bytes-like object. You'll get this error from re.findall(...)
when the second parameter is not a string or bytes-like object. For example: re.findall("somestring", None)
.
When I run your code but change the above to just file_contents = file_content
and then I print(total_fruit_per_sort(input_str))
, I get an empty string, but no errors.
A second thing to note which is probably why I get an empty string is that your total_amount_fruit_regex
raw (r
) string is not an f-string so the portion within {self.fruit_list()}
is just a raw string and not the interpolated values as you probably expect. You can fix this by prefixing the string with an f
. I believe a string can be both or just an f-string should work fine here depending on how you want to deal with escaping certain characters.
CodePudding user response:
ah, oke. this fixed the issue:
def total_amount_fruit_regex(self):
return r"(\d*(?:\.\d )*)\s*(" '|'.join(re.escape(word)
for word in self.extractingText.list_fruit) ')'
number_found = re.findall(
self.total_amount_fruit_regex(), self.extractingText.text_factuur_verdi[0])