Home > Mobile >  Determine the most popular song based on number of streams by a specific artist in R
Determine the most popular song based on number of streams by a specific artist in R

Time:10-14

I have a music data in R and I have to determine the most popular song based on the number of streams for one specific artist. I have to create a new data.frame that only contains the songs from this artist, save it and sort it by number of streams.

The data provides a list of songs and includes columns, such as the number of streams, name of song, name of artist etc. I started like this, is there a simpler way to do it?

filter(music_data, artistName == "Billie Eilish")   
billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))   
billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),] 
print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))

Thank you!

CodePudding user response:

The code you've added looks pretty good. Here's some comments:

filter(music_data, artistName == "Billie Eilish") 
# this prints its result when you run it, but the result is not
# assigned with `<-` or `=`, so it is not saved.
# it's good to run code like this in your console, but you don't need it
# in the script file.


billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))   
# here you repeat the code above, assigning it. This emphasizes that the 
# first line could be deleted. Also `data.frame()` is unnecessary. 
# Change to `billie_songs <- filter(music_data, artistName == "Billie Eilish")`

billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),] 
# this is fine. This is a great way to order rows of data using base R.
# You used `dplyr` above with `filter`, the dplyr way would have you use
# `arrange(billie_songs, desc(streams))` instead

print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))
# The `print()` is unnecessary, but this is good

If I were writing it I would use all dplyr functions and not save the result each step, instead using the %>% pipe to chain the commands together, like this:

music_data %>%
  filter(artistName == "Billie Eilish") %>%
  arrange(desc(streams)) %>%
  head(1) %>%
  pull(trackName) %>%
  paste("Most Popular Song:", .)

Or I might use the dplyr convenience function slice_max that pulls the row with the maximum value of a particular column:

music_data %>%
  filter(artistName == "Billie Eilish") %>% 
  slice_max(order_by = streams, n = 1) %>%
  pull(trackName) %>%
  paste("Most Popular Song:", .)
  •  Tags:  
  • r
  • Related