How to create a comma separated list out of string, separated by numbered spaces inside JSON object-CodePudding

I am use the openai api and am getting JSON objects returned like so:

response = {'id': 'xyz',
 'object': 'text_completion',
 'created': 1673323957,
 'model': 'text-davinci-003',
 'choices': [{'text': '\n\n1. Dog Diet and Nutrition \n2. Dog Vaccination and Immunization \n3. Dog Parasites and Parasite Control \n4. Dog Dental Care and Hygiene \n5. Dog Grooming and Skin Care \n6. Dog Exercise and Training \n7. Dog First-Aid and Emergency Care \n8. Dog Joint Care and Arthritis \n9. Dog Allergies and Allergy Prevention \n10. Dog Senior Care and Health',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'length'}],
 'usage': {'prompt_tokens': 16, 'completion_tokens': 100, 'total_tokens': 116}}

For the choices and more specifically for text only key in the dictionary, how do I replace the values of the key text with comma separated list for every enumeration where we see \n2. (for example) and replace every \n\n1. with nothing, in the string for key text? Furthermore, the api is a bit finicky and sometimes the numbers are not returned with the response and only \n\n or \n is returned. Ideally the solution is flexible to take care of this but if not thats ok.

I want to pull out the choices into a new variable.

The new choices list should look like this:

new_choices =  ['Dog Diet and Nutrition', 'Dog Vaccination and Immunization', 'Dog Parasites and Parasite Control', 'Dog Dental Care and Hygiene', 'Dog Grooming and Skin Care', 'Dog Exercise and Training', 'Dog First-Aid and Emergency Care', 'Dog Joint Care and Arthritis', 'Dog Allergies and Allergy Prevention', 'Dog Senior Care and Health']

I have tried this code and gets me half to the list, but leaves behind the numbers and adds multiple commas in some places and I don't know where to go from here especially to take out the numbers and replacing them with commas:

new_choices = [response.json()['choices'][0]['text'].replace('\n',',')]

result:

[',,1. Dog Diet and Nutrition ,2. Dog Vaccination and Immunization ,3. Dog Parasites and Parasite Control ,4. Dog Dental Care and Hygiene ,5. Dog Grooming and Skin Care ,6. Dog Exercise and Training ,7. Dog First-Aid and Emergency Care ,8. Dog Joint Care and Arthritis ,9. Dog Allergies and Allergy Prevention ,10. Dog Senior Care and Health']

CodePudding user response：

You can try to use re module for the task:

import re

response = {
    "id": "xyz",
    "object": "text_completion",
    "created": 1673323957,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": "\n\n1. Dog Diet and Nutrition \n2. Dog Vaccination and Immunization \n3. Dog Parasites and Parasite Control \n4. Dog Dental Care and Hygiene \n5. Dog Grooming and Skin Care \n6. Dog Exercise and Training \n7. Dog First-Aid and Emergency Care \n8. Dog Joint Care and Arthritis \n9. Dog Allergies and Allergy Prevention \n10. Dog Senior Care and Health",
            "index": 0,
            "logprobs": None,
            "finish_reason": "length",
        }
    ],
    "usage": {"prompt_tokens": 16, "completion_tokens": 100, "total_tokens": 116},
}

pat = re.compile(r"^(?:\d \.)?\s*(. ?)\s*$", flags=re.M)

for ch in response["choices"]:
    ch["text"] = pat.findall(ch["text"].strip())

print(response)

Prints:

{
    "id": "xyz",
    "object": "text_completion",
    "created": 1673323957,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": [
                "Dog Diet and Nutrition",
                "Dog Vaccination and Immunization",
                "Dog Parasites and Parasite Control",
                "Dog Dental Care and Hygiene",
                "Dog Grooming and Skin Care",
                "Dog Exercise and Training",
                "Dog First-Aid and Emergency Care",
                "Dog Joint Care and Arthritis",
                "Dog Allergies and Allergy Prevention",
                "Dog Senior Care and Health",
            ],
            "index": 0,
            "logprobs": None,
            "finish_reason": "length",
        }
    ],
    "usage": {"prompt_tokens": 16, "completion_tokens": 100, "total_tokens": 116},
}