Home > Back-end >  Split a string value and then create a new one using pandas
Split a string value and then create a new one using pandas

Time:03-23

I have a json file with the following structure

 [
        {
            "name": "Collection 1",
            "details": [
                {
                    "id": 302,
                    "description":"Book destined for kids",
                },
                {
                    "id": 304,
                    "description":"Book destined for Teen",
                },
                {
                    "id": 305,
                    "description":"Only for teen under the age of 13",
                },
            ]
        },

      {
            "name": "Collection 1",
            "details": [
                {
                    "id": 400,
                    "description":"books for adults to read",
                },
     
            ]
        },

    ]

I need to add a new key/value which value should be a substring of description Something like [teen,kids,adults]

Expected output:

[
    {
        "name": "Collection 1",
        "details": [
            {
                "id": 302,
                "description":"Book destined for kids",
                "age range":"kids"
            },
            {
                "id": 304,
                "description":"Book destined for Teen",
                "age range":"teen"
            },
            {
                "id": 305,
                "description":"Only for teen under the age of 13",
                "age range":"teen"
            },
        ]
    },

  {
        "name": "Collection 2",
        "details": [
            {
                "id": 400,
                "description":"books for adults to read",
                "age range":"adults"
            },
        ]
    },
]

Anyone know a way How to do it in effective way using pandas please(I need to keep the same structure)

CodePudding user response:

I would do it like this:

import json

age_keywords = ["kids", "adults", "teen"] #Extend if needed

#json string to json object
json_string = '[{"name": "Collection 1","details": [{"id": 302,"description":"Book destined for kids"},{"id": 304,"description":"Book destined for Teen"},{"id": 305,"description":"Only for teen under the age of 13"}]},{"name": "Collection 1","details": [{"id": 400,"description":"books for adults to read"}]}]'
json = json.loads(json_string)

#iterate details of every book
for collection in json:
    for detail in collection['details']:
        description = detail['description'] # get description
        for keyword in age_keywords: #iterate every keyword
            if keyword in description.lower(): #check if keyword is in description
                detail['age_range'] = keyword #if keyword in description --> add age_range to json
print(json)
#save json to file --> data.json = filename
with open('data.json', 'w') as f:
    json.dump(data, f)

But i didn't really understand where you want to use pandas?

This is my json output:

[
   {
      "name":"Collection 1",
      "details":[
         {
            "id":302,
            "description":"Book destined for kids",
            "age_range":"kids"
         },
         {
            "id":304,
            "description":"Book destined for Teen",
            "age_range":"teen"
         },
         {
            "id":305,
            "description":"Only for teen under the age of 13",
            "age_range":"teen"
         }
      ]
   },
   {
      "name":"Collection 1",
      "details":[
         {
            "id":400,
            "description":"books for adults to read",
            "age_range":"adults"
         }
      ]
   }
]

CodePudding user response:

Using pandas and regular expressions, you can parse this data, clean it up, and then apply a regular expression to extract the mention of your keywords:

out = (
    pd.json_normalize(data, 'details', 'name')
    .assign(
        description=lambda df: df['description'].str.lower(), 
        age_range=lambda df: df['description'].str.extract(r'(kid|teen|adult)')
    )
)

print(out)
    id                        description          name age_range
0  302             book destined for kids  Collection 1       kid
1  304             book destined for teen  Collection 1      teen
2  305  only for teen under the age of 13  Collection 1      teen
3  400           books for adults to read  Collection 1     adult
  • Related