Home > Software design >  How to retrieve the top values of rowsums with the associated value of another column?
How to retrieve the top values of rowsums with the associated value of another column?

Time:08-08

I'm trying to retrieve the top values of rowSums when applied to a data frame that is similar to the following one:

set.seed(101)

df = data.frame(descriptions = LETTERS, 
           col1 = round(runif(26 , min = 2, max = 250)),
           col2 = round(runif(26 , min = 2, max = 250)),
           col3 = round(runif(26 , min = 2, max = 250)))

The desired output is to get a data frame (lets say "top_descriptions" table ) consisting of a column with a range of values from the greater rowSums value to the minor one and a second column of the "descriptions" values.

desired output:

top_descriptions
descriptions row_sums
G 553
Y 507
B 430
A 412
R 356
C 321
D 305
F 243
J 222
O 212
N 202
E 195
T 187
X 167
W 132
L 111
  • the values shown on the table are random values only for showing the table data structure that I want.

CodePudding user response:

What I believe you want by "column with a range of values from the greater rowSums value to the minor one" and the example desired output is to sort your data from highest to lowest based on the sum of the columns - to do this you could try:

df$rowsums <- rowSums(df[,2:4])
df <- df[rev(order(df$rowsums)),]
# or
# df <- df[order(df$rowsums, decreasing = TRUE),]

Output:

#  descriptions col1 col2 col3 rowsums
#22            V  239  229  211     679
#14            N  233  231  179     643
#12            L  177  204  248     629
#17            Q  205  191  221     617
#10            J  137  229  203     569
#4             D  165  166  196     527
#...           .. ...  ...  ...     ...
#1             A   94   20  171     285
#8             H   85   42  148     275
#25            Y  231   22   14     267
#24            X  166   29   71     266
#19            S  104  100   60     264
#15            O  115  110    7     232

If you only wanted the rowsums and descriptions, just change the above code to:

df[rev(order(df$rowsums)),c(1,4)]

Note that you edited your question to update the seed but did not update the table, so the data wont match.

CodePudding user response:

  • Using dplyr rowwise function
library(dplyr)

df |>
   rowwise() |> mutate(row_sums = sum(c_across(starts_with("col")))) |>
   arrange(-row_sums) |>
   select(descriptions , row_sums)
  • output
# A tibble: 26 × 2
# Rowwise: 
   descriptions row_sums
   <chr>           <dbl>
 1 V                 679
 2 N                 643
 3 L                 629
 4 Q                 617
 5 J                 569
 6 D                 527
 7 I                 492
 8 W                 454
 9 K                 426
10 M                 402
# … with 16 more rows
  • Related