I have a set of data where i am trying to model the rate of TB cases per unit population. Am I correct in thinking to find the rate of TB per unit of the population is as simple as doing;
rate <- tbData$TB/tbData$Population
My df is called tbData with the following variables;
head(TBdata)
Indigenous Illiteracy Urbanisation Density Poverty Poor_Sanitation Unemployment Timeliness Year TB Population Region lon lat
1 0.335 6.35 84.1 0.714 31.3 15.3 5.41 59.2 2012 323 559543 11001 -60.7 -12.1 0.000577
2 6.45 8.49 71.4 0.743 48.6 29.4 5.92 58.1 2012 15 73193 11002 -64.0 -9.43
CodePudding user response:
Apparently yes! R is vectorized, which means you can easily do vector arithmetic.
In many programming languages we need a for
loop for this kind of calculation,
r <- numeric(length(nrow(TBdata)))
for (i in seq_len(nrow(TBdata))) {
r[i] <- TBdata[i, 'TB'] / TBdata[i, 'Population']
}
r
# [1] 6.229102 134.133333
whereas in R we simply do—
TBdata$TB/TBdata$Population
# [1] 6.229102 134.133333
This isn't magic of course, imagine it being passed to a C implementation under the hood that is a for
loop at the very end, but in R it would be very slow.
Data:
TBdata <- structure(list(Indigenous = 1:2, Illiteracy = c(0.335, 6.45),
Urbanisation = c(6.35, 8.49), Density = c(84.1, 71.4), Poverty = c(0.714,
0.743), Poor_Sanitation = c(31.3, 48.6), Unemployment = c(15.3,
29.4), Timeliness = c(5.41, 5.92), Year = c(59.2, 58.1),
TB = c(2012L, 2012L), Population = c(323L, 15L), Region = c(559543L,
73193L), lon = 11001:11002, lat = c(-60.7, -64), foo = c(-12.1,
-9.43)), class = "data.frame", row.names = c(NA, -2L))