How to pull specific information from a data-frame made from a JSON in R?-CodePudding

Given the following code:

install.packages(c("httr", "jsonlite"))
library(httr)
library(jsonlite)

res1<-GET("https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json")
res1
rawToChar(res1$content)

data1 = fromJSON(rawToChar(res1$content))

us100<-data1$feed$results

res2 <- GET("https://rss.applemarketingtools.com/api/v2/gb/music/most-played/100/songs.json")

data2<-fromJSON(rawToChar(res2$content))
uk100<-data2$feed$results

How can I get only country songs from the us100?

CodePudding user response：

library(tidyverse)
us100 %>%
   rowwise()%>%
   filter(any(grepl("Country", genres$name)))

oe even:

us100 %>%
    filter(map_lgl(genres, ~any(grepl("Country", .x$name))))

CodePudding user response：

According to the object that is returned from JSON, given songs can have more than one genre, genres returns as a nested list column of 100 data frames where each genre data frame corresponds to each row of the songs data frame.

'data.frame':   100 obs. of  11 variables:
 $ artistName           : chr  "Jack Harlow" "Harry Styles" "Morgan Wallen" "Lil Baby" ...
 $ id                   : chr  "1618136917" "1615585008" "1618841244" "1618285316" ...
 $ name                 : chr  "First Class" "As It Was" "Don't Think Jesus" "In A Minute" ...
 $ releaseDate          : chr  "2022-04-08" "2022-03-31" "2022-04-15" "2022-04-08" ...
 $ kind                 : chr  "songs" "songs" "songs" "songs" ...
 $ artistId             : chr  "1047679432" "471260289" "829142092" "1276656483" ...
 $ artistUrl            : chr  "https://music.apple.com/us/artist/jack-harlow/1047679432" "https://music.apple.com/us/artist/harry-styles/471260289" "https://music.apple.com/us/artist/morgan-wallen/829142092" "https://music.apple.com/us/artist/lil-baby/1276656483" ...
 $ contentAdvisoryRating: chr  "Explict" NA NA "Explict" ...
 $ artworkUrl100        : chr  "https://is4-ssl.mzstatic.com/image/thumb/Music116/v4/e6/b8/83/e6b88344-8d5d-d353-06bc-3204d87071e6/075679759252"| __truncated__ "https://is2-ssl.mzstatic.com/image/thumb/Music126/v4/2a/19/fb/2a19fb85-2f70-9e44-f2a9-82abe679b88e/886449990061"| __truncated__ "https://is2-ssl.mzstatic.com/image/thumb/Music122/v4/cd/c6/67/cdc667b2-e0d9-dd6f-e2b6-45ebd037c821/22UMGIM38328"| __truncated__ "https://is4-ssl.mzstatic.com/image/thumb/Music126/v4/fc/1e/a5/fc1ea575-13ff-4a85-c158-962042ecdcd8/22UMGIM38834"| __truncated__ ...
 $ genres               :List of 100
  ..$ :'data.frame':    2 obs. of  3 variables:
  .. ..$ genreId: chr  "18" "34"
  .. ..$ name   : chr  "Hip-Hop/Rap" "Music"
  .. ..$ url    : chr  "https://itunes.apple.com/us/genre/id18" "https://itunes.apple.com/us/genre/id34"
  ..$ :'data.frame':    2 obs. of  3 variables:
  .. ..$ genreId: chr  "14" "34"
  .. ..$ name   : chr  "Pop" "Music"
  .. ..$ url    : chr  "https://itunes.apple.com/us/genre/id14" "https://itunes.apple.com/us/genre/id34"
  ..$ :'data.frame':    2 obs. of  3 variables:
  .. ..$ genreId: chr  "6" "34"
  .. ..$ name   : chr  "Country" "Music"
  .. ..$ url    : chr  "https://itunes.apple.com/us/genre/id6" "https://itunes.apple.com/us/genre/id34"
...

Therefore, consider merge of the inner genres with outer songs data but first reshape to wide format to capture data at song id level:

# RETRIEVE FROM RAW JSON
us_top_100_songs <- data1$feed$results

# ADD SONG ID AS NEW COLUMN IN GENRE DATA FRAME
us_top_100_songs$genres <- lapply(
  seq_along(us_top_100_songs$genres),
  function(i) 
    data.frame(
      us_top_100_songs$genres[[i]], id = us_top_100_songs$id[i]
    )
)

# RESHAPE GENRE DATA FRAME WIDE AT SONG ID LEVEL
genres_df <- do.call(
  rbind.data.frame, us_top_100_songs$genres
) |> transform(
  genre = 1
) |> reshape(
  idvar = "id",
  v.names = "genre",
  timevar = "name",
  drop = c("genreId", "url"),
  direction = "wide",
  sep = "_"
)

# MERGE DATA FOR FLAT COLUMNS
us_top_100_songs <- merge(
  transform(us_top_100_songs, genres = NULL), genres_df, by = "id"
)

# SUBSET FOR COUNTRY SONGS
country_songs_in_top_100 <- subset(us_top_100_songs, genre_Country == 1)

Output

country_songs_in_top_100[1:6]
#            id    artistName                               name releaseDate  kind   artistId
# 4  1440111980 Morgan Wallen                    Whiskey Glasses  2016-01-01 songs  829142092
# 5  1440111985 Morgan Wallen                        Chasin' You  2018-04-27 songs  829142092
# 13 1540314622 Morgan Wallen                   Sand In My Boots  2021-01-08 songs  829142092
# 14 1540314624 Morgan Wallen                      Wasted On You  2021-01-08 songs  829142092
# 18 1563946213  Jordan Davis        Buy Dirt (feat. Luke Bryan)  2021-05-21 songs 1240921740
# 28 1582024384  Cody Johnson                     'Til You Can't  2021-06-11 songs  331459657
# 53 1608990811        ERNEST Flower Shops (feat. Morgan Wallen)  2021-12-31 songs 1450042443
# 98 1618841244 Morgan Wallen                  Don't Think Jesus  2022-04-15 songs  829142092