Home > front end >  How do I expand factors in a column into multiple columns - essentially shortening the dataset by ha
How do I expand factors in a column into multiple columns - essentially shortening the dataset by ha

Time:03-09

How do I break/split the "strike" column into two columns, "kick_type" and "punch_type", and the "damage" column to "kick damage" and "punch damage"?

I have been at this for 3 hours now and I can't figure out how to split this. Note that I've used pivot_longer to come to this stage from untidy format where the strikes were all columns, so I've done other steps before but can't figure this out.

Reproducible code:

trial <- data.frame(fighter=c("Saenchai","Saenchai","Saenchai","Saenchai","Buakaw","Buakaw","Buakaw","Buakaw"), 
strike=rep(c("roundhouse_kick","side_kick","lefthook_punch","uppercut_punch")),
damage=c(0.7,0.8,0.6,0.3,0.9,0.5,0.7,0.1))

It should look like this, but I don't know how to get there:

fighter   kick_type         kick_damage   punch_type      punch_damage
Saenchai  roundhouse_kick   0.7           lefthook_punch  0.6
Saenchai  side_kick         0.8           uppercut_punch  0.3

CodePudding user response:

I'm sure there are better ways, but here's a non-regex friendly way:

library(tidyverse)

trial %>% 
  pivot_wider(names_from = "strike", values_from = "damage") %>% 
  pivot_longer(ends_with('kick'), names_pattern = '(.*)_kick', names_to = "kick_type", values_to = "kick_damage") %>% 
  pivot_longer(ends_with('punch'), names_pattern = '(.*)_punch', names_to = "punch_type", values_to = "punch_damage") %>% 
  group_by(fighter) %>% 
  filter(row_number() == 1 | row_number() == n())

# A tibble: 4 x 5
# Groups:   fighter [2]
  fighter  kick_type  kick_damage punch_type punch_damage
  <chr>    <chr>            <dbl> <chr>             <dbl>
1 Saenchai roundhouse         0.7 lefthook            0.6
2 Saenchai side               0.8 uppercut            0.3
3 Buakaw   roundhouse         0.9 lefthook            0.7
4 Buakaw   side               0.5 uppercut            0.1

Another simpler way is to use separate:

trial %>%
  separate(strike, into = c("type", "move")) %>% 
  group_by(fighter, move) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(c(fighter, n), names_from = move, values_from = c(type, damage))

# A tibble: 4 x 6
# Groups:   fighter [2]
  fighter      n type_kick  type_punch damage_kick damage_punch
  <chr>    <int> <chr>      <chr>            <dbl>        <dbl>
1 Saenchai     1 roundhouse lefthook           0.7          0.6
2 Saenchai     2 side       uppercut           0.8          0.3
3 Buakaw       1 roundhouse lefthook           0.9          0.7
4 Buakaw       2 side       uppercut           0.5          0.1

CodePudding user response:

Solution with data.table (inspired by @Maël answer)

library(data.table)

# data.table
trial <- data.table::data.table(fighter=c("Saenchai","Saenchai","Saenchai","Saenchai","Buakaw","Buakaw","Buakaw","Buakaw"), 
                                strike=rep(c("roundhouse_kick","side_kick","lefthook_punch","uppercut_punch")),
                                damage=c(0.7,0.8,0.6,0.3,0.9,0.5,0.7,0.1))

# pivot_wider() equivalent
trial <-  dcast(trial, fighter~strike, value.var="damage")

# pivot_longer() equivalent, punch
trial <- data.table::melt(data          = trial,
                            id.vars       = c("fighter",
                                              "roundhouse_kick","side_kick"),
                            measure.vars  = c("lefthook_punch",
                                              "uppercut_punch"),
                            value.name    = "punch_damage",
                            variable.name = "punch_type")

# pivot_longer() equivalent, kick
trial <- data.table::melt(data          = trial,
                            id.vars       = c("fighter", "punch_damage","punch_type"),
                            measure.vars  = c("roundhouse_kick","side_kick"),
                            value.name    = "kick_damage",
                            variable.name = "kick_type")

# Select first and last row by fighter
trial <- trial[
  j = .SD[unique(c(1,.N))],
  by = c("fighter")
]

CodePudding user response:

Another tidyverse approach here.

library(tidyverse)

trial %>% 
  separate(strike, sep = "_", into = c("type", "attack")) %>% 
  pivot_wider(everything(), names_from = attack, names_glue = "{attack}_{.value}", 
              values_from = c("type", "damage"), values_fn = list) %>% 
  unnest(cols = !fighter) %>% 
  select(fighter, kick_type, kick_damage, punch_type, punch_damage)

# A tibble: 4 × 5
  fighter  kick_type  kick_damage punch_type punch_damage
  <chr>    <chr>            <dbl> <chr>             <dbl>
1 Saenchai roundhouse         0.7 lefthook            0.6
2 Saenchai side               0.8 uppercut            0.3
3 Buakaw   roundhouse         0.9 lefthook            0.7
4 Buakaw   side               0.5 uppercut            0.1
  • Related