ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a-CodePudding

This is the code but the part of the error is where is the extraction of the substrings after validating the regex pattern structure

def name_and_img_identificator(input_text, text):
    input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f] ", r"\1", normalize("NFD", input_text), 0, re.I)
    input_text = normalize( 'NFC', input_text) # -> NFC
    input_text_to_check = input_text.lower() #Convierte a minuscula todo

    
    #regex_patron_01 = r"\s*\¿?(?:dime los|dime las|dime unos|dime unas|dime|di|cuales son los|cuales son las|cuales son|cuales|que animes|que|top)\s*((?:\w \s*) )\s*(?:de series anime|de anime series|de animes|de anime|animes|anime)\s*(?:similares al|similares a|similar al|similar a|parecidos al|parecidos a|parecido al|parecido a)\s*(?:la serie de anime|series de anime|la serie anime|la serie|anime|)\s*(llamada|conocida como|cuyo nombre es|la cual se llama|)\s*((?:\w \s*) )\s*\??"

    #Regex in english
    regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w   \ s *)  ) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w   \ s *)  ) \ s * \ ?? "

    m = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code

    if m:
        num, anime_name = m.groups()[2]

        num = num.strip()
        anime_name = anime_name.strip()
        print(num)
        print(anime_name)

    return text

input_text_str = input("ingrese: ")
text = ""

print(name_and_img_identificator(input_text_str, text))

It gives me this error, and the truth is I don't know how to structure this regex pattern so that it only extracts those 2 values (substrings) from that input

Traceback (most recent call last):
  File "serie_recommendarion_for_chatbot.py", line 154, in <module>
    print(serie_and_img_identificator(input_text_str, text))
  File "anime_recommendarion_for_chatbot.py", line 142, in name_and_img_identificator
    num, anime_name = m.groups()
ValueError: too many values to unpack (expected 2)

If I put an input like this: 'Dame el top 8 de animes parecidos a Gundam' 'Give me the top 8 anime like Gundam'

I need you to extract:

num = '8'
anime_name = 'Gundam'

How do I have to fix my regex sequence in that case?

CodePudding user response：

You can try extracting the first 2 values, maybe you are missing a colon.

num, anime_name = m.groups()[:2]

That might be the case because you are facing the too many values to unpack error.

Use two separate patterns for the number and the name. For simplicity, I only included a few examples.

For the number Test cases

(?<=(which are the|which|top)\s)[0-9] (?=\s(anime series|anime))

For the name Test cases

(?<=(like|called|which is called)\s)[A-Za-z]

The rest is your job to implement the patterns in Spanish.

CodePudding user response：

Try this out in the Regex playground: Link

So nothing much is changed, the first capture group is still the quantifier for the number of animes, and the 2nd group is the name of the anime itself. I just simplified the regex a bit (got rid of some unnecessary bits for demo purposes). Most of it is unchanged from your version, which was actually pretty solid regex.

Regex: \b(\d ).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w \s*) )\s*\??

Test with your original question - which I translated roughly to English :-)

import re
from unicodedata import normalize


def name_and_img_identificator(input_text, text):
    input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f] ", r"\1",
                        normalize("NFD", input_text), 0, re.I)
    input_text = normalize('NFC', input_text)  # -> NFC
    input_text_to_check = input_text.lower()  # Convierte a minuscula todo


    # Regex in english

    # original
    #   note: you have extra spaces here, which regex might not like.
    #   you can get rid of spaces and then it should hopefully be fine.
    # regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w   \ s *)  ) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w   \ s *)  ) \ s * \ ?? "

    # simplified
    regex_patron_01 = r'\b(\d ).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w \s*) )\s*\??'

    m = re.search(regex_patron_01, input_text_to_check,
                  re.IGNORECASE)  # Con esto valido la regex haber si entra o no en el bloque de code

    if m:
        num, anime_name = m.groups()[:2]

        num = num.strip()
        anime_name = anime_name.strip()
        print(num)
        print(anime_name)

    return text


#input_text_str = input("ingrese: ")
input_text_str = 'Tell me the top 8 animes that are like Gundam?'
text = ""

print(name_and_img_identificator(input_text_str, text))

CodePudding user response：

Errors in the regex pattern

You forgot to add ?: to not capture this group. Change:

regex_patron_01 = r"...(llamada|conocida como|cuyo nombre es|la cual se llama|)..."

To:

regex_patron_01 = r"...(?:llamada|conocida como|cuyo nombre es|la cual se llama|)..."

To not capture additional spaces or words, your capturing of the num should be non-greedy so that it doesn't catch words like "de"and let the succeeding patterns match it. Change:

regex_patron_01 = r"...((?:\w \s*) )..."

To:

regex_patron_01 = r"...((?:\w ?\s*?) )..."

The .groups() contain already the string matches, thus accessing an index would give you a single string only, which is the root cause of your error. Change:

num, anime_name = m.groups()[2]

To:

num, anime_name = m.groups()

With those changes above, it would be successful:

8
gundam

Improvement

Your regex is too complicated and contains a lot of hard-coded words which would differ by language. My suggestion is to set a standard on the format of the string it can accept to:

Any text here (num) any text here (anime_name)

Which is already the format of your input:

Dame el top 8 de animes parecidos a Gundam

Thus you can remove that long regex and replace with this and the output would be the same:

regex_patron_01 = r"^.*?(\d ).*\s(. )$"

Note that this requires the (anime_name) to be a single-word. To support multi-words, we have to set a special character that will mark the start of the anime name such as colon :

Dame el top 8 de animes parecidos a: Gundam X

Then the regex would be:

regex_patron_01 = r"^.*?(\d ).*:\s(. )$"

Output

8
gundam x