Home > Enterprise >  Calculation of a ratio of columns to create a model
Calculation of a ratio of columns to create a model

Time:07-16

I have a set of data where i am trying to model the rate of TB cases per unit population. Am I correct in thinking to find the rate of TB per unit of the population is as simple as doing;

rate <- tbData$TB/tbData$Population

My df is called tbData with the following variables;

head(TBdata)
  Indigenous Illiteracy Urbanisation Density Poverty Poor_Sanitation Unemployment Timeliness  Year    TB Population Region   lon    lat    
1      0.335       6.35         84.1   0.714    31.3            15.3         5.41       59.2  2012   323     559543  11001 -60.7 -12.1  0.000577
2      6.45        8.49         71.4   0.743    48.6            29.4         5.92       58.1  2012    15      73193  11002 -64.0  -9.43

CodePudding user response:

Apparently yes! R is vectorized, which means you can easily do vector arithmetic.

In many programming languages we need a for loop for this kind of calculation,

r <- numeric(length(nrow(TBdata)))
for (i in seq_len(nrow(TBdata))) {
  r[i] <- TBdata[i, 'TB'] / TBdata[i, 'Population']
}
r
# [1]   6.229102 134.133333

whereas in R we simply do—

TBdata$TB/TBdata$Population
# [1]   6.229102 134.133333

This isn't magic of course, imagine it being passed to a C implementation under the hood that is a for loop at the very end, but in R it would be very slow.


Data:

TBdata <- structure(list(Indigenous = 1:2, Illiteracy = c(0.335, 6.45), 
    Urbanisation = c(6.35, 8.49), Density = c(84.1, 71.4), Poverty = c(0.714, 
    0.743), Poor_Sanitation = c(31.3, 48.6), Unemployment = c(15.3, 
    29.4), Timeliness = c(5.41, 5.92), Year = c(59.2, 58.1), 
    TB = c(2012L, 2012L), Population = c(323L, 15L), Region = c(559543L, 
    73193L), lon = 11001:11002, lat = c(-60.7, -64), foo = c(-12.1, 
    -9.43)), class = "data.frame", row.names = c(NA, -2L))
  • Related