Home > Back-end >  Using other Column information to create a new column based on categories in R
Using other Column information to create a new column based on categories in R

Time:03-23

I have a dataset with information along these lines:

d <- data.frame( 
  ID = c(1,2,3,4),
  Baseball = c(1,0,0,0),
  Basketball = c(1,0,1,0),
  Lacross = c(0,1,0,0),
  Hockey = c(1,0,0,0),
  Football = c(0,0,1,0)
)

Where 1 indicated that the person does the sport and 0 indicated they do not.

Now I want to create a new column that takes into account the sport season and apprehends them together (in this, Spring= baseball, lacrosse, Fall = football, Winter= basketball, hockey)

What I would want is an output something like:

ID   Baseball   Basketball   Lacross   Hockey   Football   Season
1       1            1          0         1         0      Spring, Winter
2       0            0          1         0         0      Spring
3       0            1          0         0         1      Winter, Fall
4       0            0          0         0         0      NA

Where I do not want duplicates if they play multiple sports in the same season like "Spring, Winter, Winter". Preferably if they could all be in the same order (all Spring, Winters are in that order), that would be great, but I can also edit that later.

CodePudding user response:

Create a named vector and then loop over the rows, get the column names where the value is 1, use the named vector to match and paste the unique values

nm1 <- setNames( c("Spring", "Winter", "Spring", "Winter", "Fall"), names(d)[-1])
d$Season <- apply(d[-1], 1, \(x) toString(unique(nm1[names(x)][as.logical(x)])))
d$Season[d$Season == ""] <- NA

-output

> d
  ID Baseball Basketball Lacross Hockey Football         Season
1  1        1          1       0      1        0 Spring, Winter
2  2        0          0       1      0        0         Spring
3  3        0          1       0      0        1   Winter, Fall
4  4        0          0       0      0        0           <NA>

CodePudding user response:

Here is another approach. Create a look up table called seasons, and merge it on a long version of d

library(data.table)
seasons = data.table(
  season = c("Spring", "Spring", "Fall", "Winter", "Winter"),
  sport = c("Baseball", "Lacross", "Football", "Basketball", "Hockey")
)
merge(
  setDT(d),
  melt(
    d, id="ID", variable.name = "sport")[value==1] %>% 
    .[seasons, on=.(sport)] %>% 
    .[,.(list(unique(season))), by=ID], 
  all.x = T
)

Output:

      ID Baseball Basketball Lacross Hockey Football            V1
   <num>    <num>      <num>   <num>  <num>    <num>        <list>
1:     1        1          1       0      1        0 Spring,Winter
2:     2        0          0       1      0        0        Spring
3:     3        0          1       0      0        1   Fall,Winter
4:     4        0          0       0      0        0              
  • Related