Table percentage of one variable controlled for other variables-CodePudding

I have a big data set containing information about the number of years children lived at home, education status of their mother and survey year. I'm trying to create a table that shows the percentage of people living with their parents for 15 years during each survey year, controlled for their mothers education (using casmin-index).

The table should look something like this:

	% 15 years in survey year 2000	% 15 years in survey year 2005	% 15 years in survey year 2010
casmin_mother1	...
casmin_mother2
casmin_mother3

Example data would be:

df <- data.frame(years = c(15,14,15,15,12,15,11,4,1,15), 
             casmin_mother= c(3,3,2,1,2,1,3,2,1,2), 
             sample_year = c(2000, 2005, 2000,2005, 2000, 2005, 2000, 2005, 2000, 2005))

I asked this question here before in different form and got the following answer which sadly didn't work (which I only found out yesterday due to holidays, so posting on the old question would be awkward I think?):

df %>% 
  group_by(sample_year) %>% 
  mutate(fifteen= (sample_year == 15)/n()) %>% 
  group_by(sample_year, casmin_mother) %>% 
  summarise(fifteen_prop = sum(fifteen))

This would create the table in a different form which is no problem, but the percentages are off. I think it might have to do with n() using the complete column and not the column sorted by sample year, but I really couldn't figure it out.

I also tried various aggregate, filter and if_any ideas, but they all failed in (at least) one regard.

Also I have a weight variable for each row, so if that could be included that would be a nice bonus, but I think I could figure that out once I have the "base table".

I'm sorry for the (sort of) repost and hope anyone can help me with this.

Thanks a lot!

CodePudding user response：

Is this what you mean?

> aggregate(df, years ~ sample_year   casmin_mother, FUN = function(x) 100 * sum (x == 15) / length(x))
  sample_year casmin_mother years
1        2000             1     0
2        2005             1   100
3        2000             2    50
4        2005             2    50
5        2000             3    50
6        2005             3     0

To get to the structure you showed, you can use the reshape2 package.

> ds <- aggregate(df, years ~ sample_year   casmin_mother, FUN = function(x) 100  * sum(x == 15) / length(x))

> dcast(ds, casmin_mother ~ sample_year, value.var = 'years')
  casmin_mother 2000 2005
1             1    0  100
2             2   50   50
3             3   50    0