game_ID <- c("201600768", "201600842", "201693456", "201700848", "201804567")
I have a column in my dataset that includes many numbers like the ones above. I would like to extract the first 4 digits from each number(because it is the year the game occurred), and separate them into a new column.
Any suggestions for going about this?
CodePudding user response:
If they are always in the first four positions, you can use substr
in base R to identify the positions:
game_ID <- c("201600768", "201600842", "201693456", "201700848", "201804567")
substr(game_ID, 0, 4)
Output
# [1] "2016" "2016" "2016" "2017" "2018"
If your data are a column in a larger data frame, such as:
df <- data.frame(var1 = LETTERS[1:5],
var2 = 1:5,
game_ID = c("201600768", "201600842", "201693456", "201700848", "201804567"))
You can simply do this:
df$year <- substr(df$game_ID, 0, 4)
Output:
# var1 var2 game_ID year
# 1 A 1 201600768 2016
# 2 B 2 201600842 2016
# 3 C 3 201693456 2016
# 4 D 4 201700848 2017
# 5 E 5 201804567 2018
CodePudding user response:
Using the stringr
package to str_extract
the first 4 digits. You can use the following code:
library(dplyr)
library(stringr)
as.data.frame(game_ID) %>%
mutate(new = str_extract(game_ID, "\\d{4}"))
Output:
game_ID new
1 201600768 2016
2 201600842 2016
3 201693456 2016
4 201700848 2017
5 201804567 2018