I'm trying to retrieve the top values of rowSums when applied to a data frame that is similar to the following one:
set.seed(101)
df = data.frame(descriptions = LETTERS,
col1 = round(runif(26 , min = 2, max = 250)),
col2 = round(runif(26 , min = 2, max = 250)),
col3 = round(runif(26 , min = 2, max = 250)))
The desired output is to get a data frame (lets say "top_descriptions" table ) consisting of a column with a range of values from the greater rowSums value to the minor one and a second column of the "descriptions" values.
desired output:
top_descriptions
descriptions | row_sums |
---|---|
G | 553 |
Y | 507 |
B | 430 |
A | 412 |
R | 356 |
C | 321 |
D | 305 |
F | 243 |
J | 222 |
O | 212 |
N | 202 |
E | 195 |
T | 187 |
X | 167 |
W | 132 |
L | 111 |
- the values shown on the table are random values only for showing the table data structure that I want.
CodePudding user response:
What I believe you want by "column with a range of values from the greater rowSums value to the minor one" and the example desired output is to sort your data from highest to lowest based on the sum of the columns - to do this you could try:
df$rowsums <- rowSums(df[,2:4])
df <- df[rev(order(df$rowsums)),]
# or
# df <- df[order(df$rowsums, decreasing = TRUE),]
Output:
# descriptions col1 col2 col3 rowsums
#22 V 239 229 211 679
#14 N 233 231 179 643
#12 L 177 204 248 629
#17 Q 205 191 221 617
#10 J 137 229 203 569
#4 D 165 166 196 527
#... .. ... ... ... ...
#1 A 94 20 171 285
#8 H 85 42 148 275
#25 Y 231 22 14 267
#24 X 166 29 71 266
#19 S 104 100 60 264
#15 O 115 110 7 232
If you only wanted the rowsums
and descriptions
, just change the above code to:
df[rev(order(df$rowsums)),c(1,4)]
Note that you edited your question to update the seed but did not update the table, so the data wont match.
CodePudding user response:
- Using
dplyr
rowwise
function
library(dplyr)
df |>
rowwise() |> mutate(row_sums = sum(c_across(starts_with("col")))) |>
arrange(-row_sums) |>
select(descriptions , row_sums)
- output
# A tibble: 26 × 2
# Rowwise:
descriptions row_sums
<chr> <dbl>
1 V 679
2 N 643
3 L 629
4 Q 617
5 J 569
6 D 527
7 I 492
8 W 454
9 K 426
10 M 402
# … with 16 more rows