Home > Enterprise >  Listception... Extracting data from a list within a list within a list
Listception... Extracting data from a list within a list within a list

Time:12-29

Hey stack overflow legends,

I'm trying to extract a dataframe from a nested list within a nested list, and it feels like I'm stuck in some kind of Inception-like dream within a dream. These lists are just nesting inside each other indefinitely, it's like a never-ending Russian doll situation.

I'm using the following Trademe API call (https://api.trademe.co.nz/v1/Categories/0153-.xml) and the data is structured somewhat like this:

{
  "Category": {
    "Name": "Fashion",
    "Subcategories": [
      {
        "Name": "Women's Clothing",
        "Number": "0153-0154-",
        "Path": "Fashion/Women's Clothing",
        "Subcategories": [
          {
            "Name": "Dresses",
            "Number": "0153-0154-0155-",
            "Path": "Fashion/Women's Clothing/Dresses",
            "Subcategories": []
          },
          {
            "Name": "Tops",
            "Number": "0153-0154-0156-",
            "Path": "Fashion/Women's Clothing/Tops",
            "Subcategories": []
          },
          ...
        ]
      },
      {
        "Name": "Men's Clothing",
        "Number": "0153-0157-",
        "Path": "Fashion/Men's Clothing",
        "Subcategories": [
          {
            "Name": "Shirts",
            "Number": "0153-0157-0158-",
            "Path": "Fashion/Men's Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "Pants",
            "Number": "0153-0157-0159-",
            "Path": "Fashion/Men's Clothing/Pants",
            "Subcategories": []
          },
          ...
        ]
      },
      ...
    ]
  }
}

I want to extract a dataframe with three columns: "Name", "Number", and "Path". The "Name" column should contain the names of all the subcategories, the "Number" column should contain their corresponding numbers, and the "Path" column should contain their full paths.

I've tried using lapply() and sapply(), but I just keep getting lost in the nested lists. Any help would be greatly appreciated!

Thanks in advance, [Your Name]

CodePudding user response:

In the provided example we could do:

library(jsonlite)
library(purrr)
library(dplyr)

jsontext |>
  fromJSON() |>
  pluck("Category", "Subcategories", "Subcategories") |>
  bind_rows() |>
  select(-Subcategories)

Output:

     Name          Number                            Path
1 Dresses 0153-0154-0155- Fashion/Womens Clothing/Dresses
2    Tops 0153-0154-0156-    Fashion/Womens Clothing/Tops
3  Shirts 0153-0157-0158-    Fashion/Mens Clothing/Shirts
4   Pants 0153-0157-0159-     Fashion/Mens Clothing/Pants

Update: If the subcategories is nested indefinitely (theoretically) I guess that the easiest would be to unlist it and make use of the name structure instead of doing recursions.

E.g.

jsonunlisted <- 
  jsontext |>
  fromJSON() |>
  pluck("Category", "Subcategories") |>
  unlist() 

cbind(Name = jsonunlisted[grep("Name", names(jsonunlisted))],
      Number = jsonunlisted[grep("Number", names(jsonunlisted))],
      Path = jsonunlisted[grep("Path", names(jsonunlisted))])

Output:

                                  Name              Number                 Path                               
Name1                             "Womens Clothing" "0153-0154-"           "Fashion/Womens Clothing"          
Name2                             "Mens Clothing"   "0153-0157-"           "Fashion/Mens Clothing"            
Subcategories.Name1               "Dresses"         "0153-0154-0155-"      "Fashion/Womens Clothing/Dresses"  
Subcategories.Name2               "Tops"            "0153-0154-0156-"      "Fashion/Womens Clothing/Tops"     
Subcategories.Name1               "Shirts"          "0153-0157-0158-"      "Fashion/Mens Clothing/Shirts"     
Subcategories.Name2               "Pants"           "0153-0157-0159-"      "Fashion/Mens Clothing/Pants"      
Subcategories.Subcategories.Name1 "More Shirts"     "More 0153-0157-0158-" "More Fashion/Mens Clothing/Shirts"
Subcategories.Subcategories.Name2 "More Pants"      "More 0153-0157-0159-" "More Fashion/Mens Clothing/Pants" 

Data:

jsontext <- '{
  "Category": {
    "Name": "Fashion",
    "Subcategories": [
      {
        "Name": "Womens Clothing",
        "Number": "0153-0154-",
        "Path": "Fashion/Womens Clothing",
        "Subcategories": [
          {
            "Name": "Dresses",
            "Number": "0153-0154-0155-",
            "Path": "Fashion/Womens Clothing/Dresses",
            "Subcategories": []
          },
          {
            "Name": "Tops",
            "Number": "0153-0154-0156-",
            "Path": "Fashion/Womens Clothing/Tops",
            "Subcategories": []
          }
        ]
      },
      {
        "Name": "Mens Clothing",
        "Number": "0153-0157-",
        "Path": "Fashion/Mens Clothing",
        "Subcategories": [
          {
            "Name": "Shirts",
            "Number": "0153-0157-0158-",
            "Path": "Fashion/Mens Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "Pants",
            "Number": "0153-0157-0159-",
            "Path": "Fashion/Mens Clothing/Pants",
            "Subcategories": [          {
            "Name": "More Shirts",
            "Number": "More 0153-0157-0158-",
            "Path": "More Fashion/Mens Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "More Pants",
            "Number": "More 0153-0157-0159-",
            "Path": "More Fashion/Mens Clothing/Pants",
            "Subcategories": []
          }]
          }
        ]
      }
    ]
  }
}'

CodePudding user response:

json_string being your JSON text:

library(jsonlite)
fromJSON(json_string, simplifyDataFrame = TRUE)$Category$Subcategories[, -4]

output:

             Name     Number                    Path
1 Womens Clothing 0153-0154- Fashion/Womens Clothing
2   Mens Clothing 0153-0157-   Fashion/Mens Clothing
  • Related