Home > Mobile >  Dropping the last two numbers from every entry in a column of data.table
Dropping the last two numbers from every entry in a column of data.table

Time:11-26

Preface: I am a beginner to R that is eager to learn. Please don't mistake the simplicity of the question (if it is a simple answer) for lack of research or effort!

Here is a look at the data I am working with:

         year state age  POP
     1:   90  1001   0  239
     2:   90  1001   0  203
     3:   90  1001   1  821
     4:   90  1001   1  769
     5:   90  1001   2 1089

The state column contains the FIPS codes for all states. For the purpose of merging, I need the state column to match my another dataset. To achieve this task, all I have to do is omit the last two numbers for each FIPS code such that the table looks like this:

         year state age  POP
     1:   90  10     0  239
     2:   90  10     0  203
     3:   90  10     1  821
     4:   90  10     1  769
     5:   90  10     2 1089

I can't figure out how to accomplish this task on a numeric column. Substr() makes this easy on a character column.

CodePudding user response:

In case your number is not always 4 digits long, to omit the last two you can make use of the vectorized behavior of substr()

x <- rownames(mtcars)[1:5]
x
#> [1] "Mazda RX4"         "Mazda RX4 Wag"     "Datsun 710"       
#> [4] "Hornet 4 Drive"    "Hornet Sportabout"
substr(x, 1, nchar(x)-2)
#> [1] "Mazda R"         "Mazda RX4 W"     "Datsun 7"        "Hornet 4 Dri"   
#> [5] "Hornet Sportabo"

# dummy code for inside a data.table
dt[, x_new := substr(x, 1, nchar(x)-2)]

CodePudding user response:

Just for generalizing this in the instance when you might have a very large numeric column, and need to substr it correctly. (Which is probably a good argument for storing/importing it as a character column to start with, but it's an imperfect world...)

x <- c(10000000000, 1000000000, 100000000, 10000000, 1000000,100000,10000,1000,100)

substr(x, 1, nchar(x)-2 )
#[1] "1e " "1e " "1e " "1e " "1e " "1e " "100" "10"  "1" 

as.character(x)
#[1] "1e 10" "1e 09" "1e 08" "1e 07" "1e 06" "1e 05" "10000" "1000" 
#[9] "100" 

xsf <- sprintf("%.0f", x)
substr(xsf, 1, nchar(xsf)-2)
#[1] "100000000" "10000000"  "1000000"   "100000"    "10000"    
#[6] "1000"      "100"       "10"        "1"   

cbind(x, xsf, xsfsub=substr(xsf, 1, nchar(xsf)-2) )
#      x       xsf           xsfsub     
# [1,] "1e 10" "10000000000" "100000000"
# [2,] "1e 09" "1000000000"  "10000000" 
# [3,] "1e 08" "100000000"   "1000000"  
# [4,] "1e 07" "10000000"    "100000"   
# [5,] "1e 06" "1000000"     "10000"    
# [6,] "1e 05" "100000"      "1000"     
# [7,] "10000" "10000"       "100"      
# [8,] "1000"  "1000"        "10"       
# [9,] "100"   "100"         "1"  
  • Related