Home > front end >  How do I extract every first 4 digits from a numeric vector?
How do I extract every first 4 digits from a numeric vector?

Time:05-13

game_ID <- c("201600768", "201600842", "201693456", "201700848", "201804567")

I have a column in my dataset that includes many numbers like the ones above. I would like to extract the first 4 digits from each number(because it is the year the game occurred), and separate them into a new column.

Any suggestions for going about this?

CodePudding user response:

If they are always in the first four positions, you can use substr in base R to identify the positions:

game_ID <- c("201600768", "201600842", "201693456", "201700848", "201804567")
substr(game_ID, 0, 4)

Output

# [1] "2016" "2016" "2016" "2017" "2018"

If your data are a column in a larger data frame, such as:

df <- data.frame(var1 = LETTERS[1:5],
                 var2 = 1:5,
                 game_ID = c("201600768", "201600842", "201693456", "201700848", "201804567"))

You can simply do this:

df$year <- substr(df$game_ID, 0, 4)

Output:

#   var1 var2   game_ID year
# 1    A    1 201600768 2016
# 2    B    2 201600842 2016
# 3    C    3 201693456 2016
# 4    D    4 201700848 2017
# 5    E    5 201804567 2018

CodePudding user response:

Using the stringr package to str_extract the first 4 digits. You can use the following code:

library(dplyr)
library(stringr)
as.data.frame(game_ID) %>%
  mutate(new = str_extract(game_ID, "\\d{4}"))

Output:

    game_ID  new
1 201600768 2016
2 201600842 2016
3 201693456 2016
4 201700848 2017
5 201804567 2018
  • Related