I am new to programming in R and I have started practicing and retouching a data frame. I am trying to create a new column with the name 'Division.esperanza.media' (for example) in my dataframe that contains the average life expectancy by regions, the latter is already stored in a variable called 'esperanza.media 'and I'll put it below.
How could you put the mean value of each region in each state? For instance: Imagine that the state of Alabama and Arkansas have a value of 'South Atlantic' in division, because I want that with the variable 'esperanza.media' that I have created, enter the values of the mean of south atlantic, and the same with the rest regions of the dataframe.
Here you have the dataframe with all modifications, the final data frame is the one called 'state.df.abb':
state.df = as.data.frame(state.x77)
estados.a.pos = endsWith(rownames(state.df), "a")
estados.a.mask = !estados.a.pos
state.df[!estados.a.mask,8] <- NA
state.df = rbind(state.df,Medias=colMeans(state.df,na.rm = TRUE))
rownames(state.df)[1:50] <- paste(state.abb)
state.df.abb = state.df
division = factor(c(as.character(state.division), NA))
state.df.abb = cbind(state.df,Division=division)
The mean expectation is calculated on this variable 'esperanza.media':
esperanza.media = tapply(state.df$`Life Exp`, division, mean, na.rm = TRUE)
I have tried to do something like this with the cbind function :
state.df.abb = cbind(state.df,Division.esperanza.media=esperanza.media)
But I don't know if I have to transform the variable 'esperanza.media' or what the problem is.
The output should be something like this:
Population Income Illiteracy Life Exp Murder HS Grad Frost Area Division Division.esperanza.media
AL 3615.00 3624.0 2.10 69.0500 15.100 41.300 20.00 NA East South Central 69.33750
AK 365.00 6315.0 1.50 69.3100 11.300 66.700 152.00 NA Pacific 71.69400
AZ 2212.00 4530.0 1.80 70.5500 7.800 58.100 15.00 NA Mountain 70.94750
AR 2110.00 3378.0 1.90 70.6600 10.100 39.900 65.00 51945.00 West South Central 70.43500
CA 21198.00 5114.0 1.10 71.7100 10.300 62.600 20.00 NA Pacific 71.69400
If anyone can help me, I would be enormously grateful.
CodePudding user response:
This gives you the desired result.
Just associate the right regions with the values. Keep in mind that the mean value of this is taken from the shorter list esperanza.media
, but all in all the value should be the same even it calculated from the final list.
cbind( state.df,
Division=c(as.character(state.division),"NA"),
Division.esperanza.media=c(esperanza.media[sapply( state.division,
function(x) which(names(esperanza.media) == x) )],mean(esperanza.media)) )
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
AL 3615 3624 2.1 69.05 15.1 41.3 20 NA
AK 365 6315 1.5 69.31 11.3 66.7 152 NA
AZ 2212 4530 1.8 70.55 7.8 58.1 15 NA
AR 2110 3378 1.9 70.66 10.1 39.9 65 51945
CA 21198 5114 1.1 71.71 10.3 62.6 20 NA
CO 2541 4884 0.7 72.06 6.8 63.9 166 103766
Division Division.esperanza.media
AL East South Central 69.3375
AK Pacific 71.6940
AZ Mountain 70.9475
AR West South Central 70.4350
CA Pacific 71.6940
CO Mountain 70.9475
...etc
To pass TEST use:
cbind( state.df,
Division=c(as.character(state.division),"NA"),
Division.esperanza.media=c(esperanza.media[sapply( state.division,
function(x) which(names(esperanza.media) == x) )],
mean(c(esperanza.media[sapply( state.division,
function(x) which(names(esperanza.media) == x) )]))) )