Home > database >  How to calculate the average value in one column for the 10 maximum values in another column?
How to calculate the average value in one column for the 10 maximum values in another column?

Time:11-02

I have a dataset and the task:"Average number of major credit cards held for people with top 10 income".

dput(head(creditcard))
 structure(list(card = structure(c(2L, 2L, 2L, 2L, 2L, 2L), levels = c("no","yes"), class = "factor"), reports = c(0L, 0L, 0L, 0L, 0L, 0L), age = c(37.66667, 33.25, 33.66667, 30.5, 32.16667, 23.25), income = c(4.52, 2.42, 4.5, 2.54, 9.7867, 2.5), share = c(0.03326991, 0.005216942, 0.004155556, 0.06521378, 0.06705059, 0.0444384), expenditure = c(124.9833, 9.854167, 15, 137.8692, 546.5033, 91.99667), owner = structure(c(2L, 1L, 2L, 1L, 2L, 1L), levels = c("no", "yes"), class = "factor"), selfemp = structure(c(1L, 1L, 1L, 1L, 1L, 1L), levels = c("no", "yes"), class = "factor"), 
 dependents = c(3L, 3L, 4L, 0L, 2L, 0L), days = c(54L, 34L,58L, 25L, 64L, 54L), majorcards = c(1L, 1L, 1L, 1L, 1L, 1L), active = c(12L, 13L, 5L, 7L, 5L, 1L), income_fam = c(1.13, 0.605, 0.9, 2.54, 3.26223333333333, 2.5)), row.names = c("1","2", "3", "4", "5", "6"), class = "data.frame")

enter image description here

I tried to do the task like this

round(mean(creditcard[order(creditcard$income, decreasing = TRUE),]$majorcards[1:10]))

But my solution turned out to be inoptimal and I do not understand how it can be corrected

CodePudding user response:

You can get the 10 observations with the highest income using slice_max, then creating a new dataset with the mean of majorcards

library(dplyr)
creditcard %>%
  slice_max(income, n = 10) %>%
  summarise(mean(majorcards))

CodePudding user response:

If your dataset is one row per person, then you can do this:

library(dplyr)

creditcard %>%
  arrange(desc(income)) %>%
  slice_head(n=10) %>%
  summarize(mean_cards = mean(majorcards,na.rm=T))

CodePudding user response:

Maybe something like:

mean(creditcard$majorcards[which(creditcard$income%in%sort(creditcard$income, decreasing = TRUE)[1:10])])

CodePudding user response:

Using base R

with(creditcard, mean(head(majorcards[order(-income)], 10)))

Or in data.table

library(data.table)
setDT(creditcard)[order(-income), mean(head(majorcards, 10))]
  • Related