I will start off by stating that I have working code, but it is embarrassingly inefficient and clumsy. I was hoping that someone in the community might be able to show me a better way to unnest this heavily nested list.
As a background, it is transaction data on nfts that is heavily nested. I am just trying to get a data frame out, ultimately down to the daily level. I have managed to get the code working for the totalPriceUSD field, but as I mentioned, it is clumsy.
library(dplyr)
library(tidyr)
library(rlist)
library(jsonlite)
mydata <- fromJSON("https://api2.cryptoslam.io/api/nft-indexes/NFTGlobal")
#attempt at nested extraction
mydata <- rlist::list.flatten(mydata) %>% dplyr::bind_rows()
mydata <- select(mydata1, contains("totalPriceUSD"))
mydata <- select(mydata1, contains("daily"))
#change row name
rownames(mydata) <- "totalPriceUSD"
names(mydata) <- substring(names(mydata),24,33)
#change col names
names(mydata) <- format(as.Date(names(mydata), format = "%Y-%m-%d"))
mydata1 <- mydata %>%
gather(date, totalPriceUSD)
mydata <- as.data.frame(mydata)
mydata$date <- as.Date(mydata$date, format = "%Y-%m-%d")
As I said, it works, but it ain't pretty. Any suggestions on improving this?
Many thanks
CodePudding user response:
library(dplyr)
mydata <- jsonlite::fromJSON("https://api2.cryptoslam.io/api/nft-indexes/NFTGlobal")
monthly <- bind_rows(lapply(mydata, `[[`, "monthlySummary"), .id = "monthly_id")
daily <- bind_rows(lapply(mydata, function(z) bind_rows(z[["dailySummaries"]], .id = "daily_id")), .id = "monthly_id")
monthly
# # A tibble: 60 x 6
# monthly_id totalTransactions uniqueBuyers uniqueSellers totalPriceUSD isRollingHoursData
# <chr> <int> <int> <int> <dbl> <lgl>
# 1 2017-06 193 33 32 11570. FALSE
# 2 2017-07 613 61 57 89111. FALSE
# 3 2017-08 113 36 31 15133. FALSE
# 4 2017-09 63 22 19 5154. FALSE
# 5 2017-10 52 17 11 3041. FALSE
# 6 2017-11 7259 1077 508 72760. FALSE
# 7 2017-12 265412 53406 23137 18804813. FALSE
# 8 2018-01 30693 7682 4582 1360558. FALSE
# 9 2018-02 34177 4142 4364 2931369. FALSE
# 10 2018-03 29051 3752 2784 987256. FALSE
# # ... with 50 more rows
daily
# # A tibble: 1,750 x 7
# monthly_id daily_id totalTransactions uniqueBuyers uniqueSellers totalPriceUSD isRollingHoursData
# <chr> <chr> <int> <int> <int> <dbl> <lgl>
# 1 2017-06 2017-06-23T00:00:00 27 9 6 1456. FALSE
# 2 2017-06 2017-06-24T00:00:00 15 7 8 846. FALSE
# 3 2017-06 2017-06-25T00:00:00 15 7 5 594. FALSE
# 4 2017-06 2017-06-26T00:00:00 23 10 12 1076. FALSE
# 5 2017-06 2017-06-27T00:00:00 35 8 15 2091. FALSE
# 6 2017-06 2017-06-28T00:00:00 15 6 5 1431. FALSE
# 7 2017-06 2017-06-29T00:00:00 41 13 11 2302. FALSE
# 8 2017-06 2017-06-30T00:00:00 22 11 7 1775. FALSE
# 9 2017-07 2017-07-01T00:00:00 12 7 10 3727. FALSE
# 10 2017-07 2017-07-02T00:00:00 34 13 12 3117. FALSE
# # ... with 1,740 more rows
CodePudding user response:
An alternative to @r2evans answer using rrapply()
unnest_wider()
. This should generalize to arbitrary levels of nesting as well.
library(tidyr)
library(jsonlite)
library(rrapply)
mydata <- fromJSON("https://api2.cryptoslam.io/api/nft-indexes/NFTGlobal")
monthly <- rrapply(mydata, classes = "list", condition = \(x, .xname) .xname == "monthlySummary", how = "melt") |>
unnest_wider(value)
daily <- rrapply(mydata, classes = "list", condition = \(x, .xparents) "dailySummaries" %in% head(.xparents, -1), how = "melt") |>
unnest_wider(value)
monthly
#> # A tibble: 60 × 9
#> L1 L2 totalTransactio… uniqueBuyers uniqueSellers totalPriceUSD
#> <chr> <chr> <int> <int> <int> <dbl>
#> 1 2017-06 monthlySum… 193 33 32 11570.
#> 2 2017-07 monthlySum… 613 61 57 89111.
#> 3 2017-08 monthlySum… 113 36 31 15133.
#> 4 2017-09 monthlySum… 63 22 19 5154.
#> 5 2017-10 monthlySum… 52 17 11 3041.
#> 6 2017-11 monthlySum… 7259 1077 508 72760.
#> 7 2017-12 monthlySum… 265412 53406 23137 18804813.
#> 8 2018-01 monthlySum… 30693 7682 4582 1360558.
#> 9 2018-02 monthlySum… 34177 4142 4364 2931369.
#> 10 2018-03 monthlySum… 29051 3752 2784 987256.
#> # … with 50 more rows, and 3 more variables: isRollingHoursData <lgl>,
#> # productNames <lgl>, productNamesWithoutAnySale <lgl>
daily
#> # A tibble: 1,750 × 10
#> L1 L2 L3 totalTransactio… uniqueBuyers uniqueSellers totalPriceUSD
#> <chr> <chr> <chr> <int> <int> <int> <dbl>
#> 1 2017-06 dail… 2017… 27 9 6 1456.
#> 2 2017-06 dail… 2017… 15 7 8 846.
#> 3 2017-06 dail… 2017… 15 7 5 594.
#> 4 2017-06 dail… 2017… 23 10 12 1076.
#> 5 2017-06 dail… 2017… 35 8 15 2091.
#> 6 2017-06 dail… 2017… 15 6 5 1431.
#> 7 2017-06 dail… 2017… 41 13 11 2302.
#> 8 2017-06 dail… 2017… 22 11 7 1775.
#> 9 2017-07 dail… 2017… 12 7 10 3727.
#> 10 2017-07 dail… 2017… 34 13 12 3117.
#> # … with 1,740 more rows, and 3 more variables: isRollingHoursData <lgl>,
#> # productNames <lgl>, productNamesWithoutAnySale <lgl>