Home > OS >  Add a column in a dataframe with the content of a variable that contains a tapply
Add a column in a dataframe with the content of a variable that contains a tapply

Time:11-18

I am new to programming in R and I have started practicing and retouching a data frame. I am trying to create a new column with the name 'Division.esperanza.media' (for example) in my dataframe that contains the average life expectancy by regions, the latter is already stored in a variable called 'esperanza.media 'and I'll put it below.

How could you put the mean value of each region in each state? For instance: Imagine that the state of Alabama and Arkansas have a value of 'South Atlantic' in division, because I want that with the variable 'esperanza.media' that I have created, enter the values of the mean of south atlantic, and the same with the rest regions of the dataframe.

Here you have the dataframe with all modifications, the final data frame is the one called 'state.df.abb':

state.df = as.data.frame(state.x77)
estados.a.pos = endsWith(rownames(state.df), "a")
estados.a.mask = !estados.a.pos
state.df[!estados.a.mask,8] <- NA
state.df = rbind(state.df,Medias=colMeans(state.df,na.rm = TRUE))
rownames(state.df)[1:50] <- paste(state.abb)
state.df.abb = state.df
division = factor(c(as.character(state.division), NA))
state.df.abb = cbind(state.df,Division=division)

The mean expectation is calculated on this variable 'esperanza.media':

esperanza.media = tapply(state.df$`Life Exp`, division, mean, na.rm = TRUE)

I have tried to do something like this with the cbind function :

state.df.abb = cbind(state.df,Division.esperanza.media=esperanza.media)

But I don't know if I have to transform the variable 'esperanza.media' or what the problem is.

The output should be something like this:

       Population Income Illiteracy Life Exp Murder HS Grad  Frost      Area           Division   Division.esperanza.media
AL        3615.00 3624.0       2.10  69.0500 15.100  41.300  20.00        NA East South Central               69.33750
AK         365.00 6315.0       1.50  69.3100 11.300  66.700 152.00        NA            Pacific               71.69400
AZ        2212.00 4530.0       1.80  70.5500  7.800  58.100  15.00        NA           Mountain              70.94750
AR        2110.00 3378.0       1.90  70.6600 10.100  39.900  65.00  51945.00 West South Central               70.43500
CA       21198.00 5114.0       1.10  71.7100 10.300  62.600  20.00        NA            Pacific               71.69400

If anyone can help me, I would be enormously grateful.

CodePudding user response:

This gives you the desired result. Just associate the right regions with the values. Keep in mind that the mean value of this is taken from the shorter list esperanza.media, but all in all the value should be the same even it calculated from the final list.

cbind( state.df, 
   Division=c(as.character(state.division),"NA"), 
   Division.esperanza.media=c(esperanza.media[sapply( state.division, 
     function(x) which(names(esperanza.media) == x) )],mean(esperanza.media)) )

   Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
AL       3615   3624        2.1    69.05   15.1    41.3    20     NA
AK        365   6315        1.5    69.31   11.3    66.7   152     NA
AZ       2212   4530        1.8    70.55    7.8    58.1    15     NA
AR       2110   3378        1.9    70.66   10.1    39.9    65  51945
CA      21198   5114        1.1    71.71   10.3    62.6    20     NA
CO       2541   4884        0.7    72.06    6.8    63.9   166 103766
             Division Division.esperanza.media
AL East South Central                  69.3375
AK            Pacific                  71.6940
AZ           Mountain                  70.9475
AR West South Central                  70.4350
CA            Pacific                  71.6940
CO           Mountain                  70.9475
...etc

To pass TEST use:

cbind( state.df, 
   Division=c(as.character(state.division),"NA"), 
   Division.esperanza.media=c(esperanza.media[sapply( state.division, 
     function(x) which(names(esperanza.media) == x) )],
     mean(c(esperanza.media[sapply( state.division, 
     function(x) which(names(esperanza.media) == x) )]))) )
  • Related