If I have a df and I want to separate player
by sep="_"
. Is it a smart way that I can do this without manually check how many col I will need? For my example, it is easy to tell I need to separate player
into 3 new cols. What is I have 100 rows. Is it a way to tell how many cols I need to separate player
into? or is it a way to split player
without knowing this info?
df <- data.frame(player=c('John_Wall', 'Dirk_Nowitzki', 'Steve_Nash_try'),
points=c(22, 29, 18),
assists=c(8, 4, 15))
## current method:
df %>% separate(player, c('v1', 'v2', 'v3'), sep ="_")
CodePudding user response:
We may use str_count
to find the count of _
, get the max
count, convert to seq
uence, paste
(str_c
) with 'v' to create the column names. Add remove = FALSE
, if the original column needs to kept
library(tidyr)
library(dplyr)
library(stringr)
df %>%
separate(player, str_c('v',
seq_len(max(str_count(.$player, '_') 1))), sep = "_", fill = "right")
-output
v1 v2 v3 points assists
1 John Wall <NA> 22 8
2 Dirk Nowitzki <NA> 29 4
3 Steve Nash try 18 15
Or instead of using separate
, try with cSplit
, which does automatically detect and create those columns
library(splitstackshape)
cSplit(df, 'player', sep = '_')
points assists player_1 player_2 player_3
<num> <num> <char> <char> <char>
1: 22 8 John Wall <NA>
2: 29 4 Dirk Nowitzki <NA>
3: 18 15 Steve Nash try
If we want to use str_extract_all
, it returns a list
column, which can be converted to columns with unnest_wider
library(tidyr)
df %>%
mutate(v = str_extract_all(player, "[^_] ")) %>%
unnest_wider(v, names_sep = "_")
# A tibble: 3 × 6
player points assists v_1 v_2 v_3
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 John_Wall 22 8 John Wall <NA>
2 Dirk_Nowitzki 29 4 Dirk Nowitzki <NA>
3 Steve_Nash_try 18 15 Steve Nash try
Or using base R
cbind(df, read.table(text = df$player, sep = "_", fill = TRUE,
header = FALSE, na.strings = ""))
player points assists V1 V2 V3
1 John_Wall 22 8 John Wall <NA>
2 Dirk_Nowitzki 29 4 Dirk Nowitzki <NA>
3 Steve_Nash_try 18 15 Steve Nash try
CodePudding user response:
Without packages you could adapt the `length<-`
by maximum lengths
.
strsplit(df$player, '_') |>
{\(.) t(sapply(., `length<-`, max(lengths(.))))} () |>
cbind(df[setdiff(names(df), 'player')])
# 1 2 3 points assists
# 1 John Wall <NA> 22 8
# 2 Dirk Nowitzki <NA> 29 4
# 3 Steve Nash try 18 15