I have a list object that contains several tables that contains a year column and then frequencies of particular words. Each table might be slightly different dimensions depending on range of years and words used.
Year | word1 | word2 | word3 |
---|---|---|---|
2009 | 1 | 5 | 4 |
2010 | 2 | 3 | 5 |
I would like to create a table that sums every row (not including the year) and then divides the column value by the row sum so that it produces a table like this:
Year | word1 | word2 | word3 |
---|---|---|---|
2009 | 0.1 | 0.5 | 0.4 |
2010 | 0.2 | 0.3 | 0.5 |
Is there a way to do this to a list object? TIA
CodePudding user response:
Does this work:
cbind(df[1], t(apply(df[-1], 1, function(x) x/sum(x))))
Year word1 word2 word3
1 2009 0.1 0.5 0.4
2 2010 0.2 0.3 0.5
If you have a list of such dataframes :
mylist <- list(df, df)
mylist
[[1]]
Year word1 word2 word3
1 2009 1 5 4
2 2010 2 3 5
[[2]]
Year word1 word2 word3
1 2009 1 5 4
2 2010 2 3 5
lapply(mylist, function(y) cbind(y[1], t(apply(y[-1], 1, function(x) x/sum(x)))))
[[1]]
Year word1 word2 word3
1 2009 0.1 0.5 0.4
2 2010 0.2 0.3 0.5
[[2]]
Year word1 word2 word3
1 2009 0.1 0.5 0.4
2 2010 0.2 0.3 0.5
Data used:
df
Year word1 word2 word3
1 2009 1 5 4
2 2010 2 3 5
CodePudding user response:
For a single data.frame, you can use the following function:
doit <- function(df) {
cbind(df[1],sweep(df[-1],1,rowSums(df[-1]),"/"))
}
e.g.
df <- data.frame(Year = 1:3, Word1 = c(1,2,3), Word2 = c(3,2,1), Word3 = c(6,6,6))
doit(df)
# Year Word1 Word2 Word3
#1 1 0.1 0.3 0.6
#2 2 0.2 0.2 0.6
#3 3 0.3 0.1 0.6
If you have multiple data.frames in a list, just wrap everything with lapply
, like lapply(dfList,doit)
.