I have a csv file with video games info. The columns are

| Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales |

Note: Sales are in millions.

One row would be:

259, Asteroids, 2600, 1980, Shooter, Atari, 4, 0.26, 0, 0.05, 4.31

I am trying to filter for those games that were released in different platforms in different years, for example, Mario Bros was released for DS and Wii in 1996 and 2000.

I have tried to create a function that uses two for loops to try and find games that have the same name, but I don't seem to get it right. I have also tried to group by Name, Year, Platform and I get it wrong too.

I can't get this done and it's really frustrating, any help would be welcomed. Thank you in advance.

CodePudding user response：

dplyr

library(dplyr)
dat %>%
  group_by(Name) %>%
  filter(n_distinct(Platform, Year) > 1) %>%
  ungroup()

data.table

library(data.table)
as.data.table(dat)[, .SD[uniqueN(interaction(Platform, Year)) > 1,], by = .(Name)]

base R

ind <- ave(interaction(dat$Platform, dat$Year), dat$Name, FUN = function(z) length(unique(z)) > 1)

If your Platform column is integer, then use

dat[ind > 0,]
## or
dat[ind == 1L,]

If, however, your Platform is character, then you'll need

dat[ind == "TRUE",]

This is because stats::ave's return value is always the same class as its first argument, dat$Platform here. Even if the inner FUNction produces logical or something else, it is always coerced. (Since ave uses `split<-` which reassigns the updated x back into the original vector, the coercing acts by default, not necessarily by-design.)

Edited to include Year in the determination.