I have a music data in R and I have to determine the most popular song based on the number of streams for one specific artist. I have to create a new data.frame that only contains the songs from this artist, save it and sort it by number of streams.
The data provides a list of songs and includes columns, such as the number of streams, name of song, name of artist etc. I started like this, is there a simpler way to do it?
filter(music_data, artistName == "Billie Eilish")
billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))
billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),]
print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))
Thank you!
CodePudding user response:
The code you've added looks pretty good. Here's some comments:
filter(music_data, artistName == "Billie Eilish")
# this prints its result when you run it, but the result is not
# assigned with `<-` or `=`, so it is not saved.
# it's good to run code like this in your console, but you don't need it
# in the script file.
billie_songs <- data.frame(filter(music_data, artistName == "Billie Eilish"))
# here you repeat the code above, assigning it. This emphasizes that the
# first line could be deleted. Also `data.frame()` is unnecessary.
# Change to `billie_songs <- filter(music_data, artistName == "Billie Eilish")`
billie_songs_ordered <- billie_songs[order(billie_songs$streams, decreasing = TRUE),]
# this is fine. This is a great way to order rows of data using base R.
# You used `dplyr` above with `filter`, the dplyr way would have you use
# `arrange(billie_songs, desc(streams))` instead
print(paste("Most Popular Song: ", head(billie_songs_ordered$trackName, 1)))
# The `print()` is unnecessary, but this is good
If I were writing it I would use all dplyr
functions and not save the result each step, instead using the %>%
pipe to chain the commands together, like this:
music_data %>%
filter(artistName == "Billie Eilish") %>%
arrange(desc(streams)) %>%
head(1) %>%
pull(trackName) %>%
paste("Most Popular Song:", .)
Or I might use the dplyr
convenience function slice_max
that pulls the row with the maximum value of a particular column:
music_data %>%
filter(artistName == "Billie Eilish") %>%
slice_max(order_by = streams, n = 1) %>%
pull(trackName) %>%
paste("Most Popular Song:", .)