I have a csv file with video games info. The columns are
| Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales |
Note: Sales are in millions.
One row would be:
259, Asteroids, 2600, 1980, Shooter, Atari, 4, 0.26, 0, 0.05, 4.31
I am trying to filter for those games that were released in different platforms in different years, for example, Mario Bros was released for DS and Wii in 1996 and 2000.
I have tried to create a function that uses two for loops
to try and find games that have the same name, but I don't seem to get it right. I have also tried to group by Name, Year, Platform
and I get it wrong too.
I can't get this done and it's really frustrating, any help would be welcomed. Thank you in advance.
CodePudding user response:
dplyr
library(dplyr)
dat %>%
group_by(Name) %>%
filter(n_distinct(Platform, Year) > 1) %>%
ungroup()
data.table
library(data.table)
as.data.table(dat)[, .SD[uniqueN(interaction(Platform, Year)) > 1,], by = .(Name)]
base R
ind <- ave(interaction(dat$Platform, dat$Year), dat$Name, FUN = function(z) length(unique(z)) > 1)
If your Platform
column is integer
, then use
dat[ind > 0,]
## or
dat[ind == 1L,]
If, however, your Platform
is character
, then you'll need
dat[ind == "TRUE",]
This is because stats::ave
's return value is always the same class as its first argument, dat$Platform
here. Even if the inner FUN
ction produces logical or something else, it is always coerced. (Since ave
uses `split<-`
which reassigns the updated x
back into the original vector, the coercing acts by default, not necessarily by-design.)
Edited to include Year
in the determination.