I have a dataframe containing user data :
age = c(45, 21, 32, 33, 46)
gender = c('female', 'female', 'male', 'male', 'female')
income = c('low', 'low', 'medium', 'high', 'low')
education = c('high', 'high', 'high', 'medium', 'medium')
df = data.frame(age, gender ,income, education)
From this i would like to obtain a legible list with a count & share of total for every attribute that i then would append to a table / csv that should be rather legible for further use than be a functioning dataframe. For one attribute that would be something like this:
nusers = nrow(users)
df = count(users, gender)
df['sot']=df['n']/totuser
write.table(df,'stat.csv',sep=';', row.names = FALSE, append = T)
With the following result desired for multiple attributes:
gender,n,sot
female,10,0.526315789
male,9,0.473684211
income,Freq,sot
low,4,0.210526316
medium,10,0.526315789
high,5,0.263157895
education,Freq,sot
low,8,0.421052632
medium,1,0.052631579
high,10,0.526315789
My (not very proficient) attempts to put this into a loop failed. How would i best go about this ?
CodePudding user response:
You can use sink()
for this:
library(dplyr)
n_gen <- df %>% group_by(gender) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_inc <- df %>% group_by(income) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_edu <- df %>% group_by(education) %>% summarise(Feq = n(), sot = n()/nrow(df))
sink('export.csv')
write.csv(n_gen, row.names = F)
write.csv(n_inc, row.names = F)
write.csv(n_edu, row.names = F)
sink()
You could shorten it and write it in a for loop. Depending on how many columns you have (in df) that might be preferred
CodePudding user response:
You should use 'count_()' instead of 'count()' it is the same function but it take variable instead of string in 'var'.
for (i in class) {
df = count_(users, i)
write.csv(df, row.names = T, file = paste0('Title_',i,'.txt'))
}