How to calculate the average value in one column for the 10 maximum values in another column?-CodePudding

I have a dataset and the task:"Average number of major credit cards held for people with top 10 income".

dput(head(creditcard))
 structure(list(card = structure(c(2L, 2L, 2L, 2L, 2L, 2L), levels = c("no","yes"), class = "factor"), reports = c(0L, 0L, 0L, 0L, 0L, 0L), age = c(37.66667, 33.25, 33.66667, 30.5, 32.16667, 23.25), income = c(4.52, 2.42, 4.5, 2.54, 9.7867, 2.5), share = c(0.03326991, 0.005216942, 0.004155556, 0.06521378, 0.06705059, 0.0444384), expenditure = c(124.9833, 9.854167, 15, 137.8692, 546.5033, 91.99667), owner = structure(c(2L, 1L, 2L, 1L, 2L, 1L), levels = c("no", "yes"), class = "factor"), selfemp = structure(c(1L, 1L, 1L, 1L, 1L, 1L), levels = c("no", "yes"), class = "factor"), 
 dependents = c(3L, 3L, 4L, 0L, 2L, 0L), days = c(54L, 34L,58L, 25L, 64L, 54L), majorcards = c(1L, 1L, 1L, 1L, 1L, 1L), active = c(12L, 13L, 5L, 7L, 5L, 1L), income_fam = c(1.13, 0.605, 0.9, 2.54, 3.26223333333333, 2.5)), row.names = c("1","2", "3", "4", "5", "6"), class = "data.frame")

I tried to do the task like this

round(mean(creditcard[order(creditcard$income, decreasing = TRUE),]$majorcards[1:10]))

But my solution turned out to be inoptimal and I do not understand how it can be corrected

CodePudding user response：

You can get the 10 observations with the highest income using slice_max, then creating a new dataset with the mean of majorcards

library(dplyr)
creditcard %>%
  slice_max(income, n = 10) %>%
  summarise(mean(majorcards))

CodePudding user response：

If your dataset is one row per person, then you can do this:

library(dplyr)

creditcard %>%
  arrange(desc(income)) %>%
  slice_head(n=10) %>%
  summarize(mean_cards = mean(majorcards,na.rm=T))

CodePudding user response：

Maybe something like:

mean(creditcard$majorcards[which(creditcard$income%in%sort(creditcard$income, decreasing = TRUE)[1:10])])

CodePudding user response：

Using base R

with(creditcard, mean(head(majorcards[order(-income)], 10)))

Or in data.table

library(data.table)
setDT(creditcard)[order(-income), mean(head(majorcards, 10))]