Home > OS >  Way to calculate the top x% of users from a database
Way to calculate the top x% of users from a database

Time:07-10

I was wondering if there was an actually good way to aggregate/calculate percentiles based on all data in a database. For example, data like:

{
'name':'John'
'rank':5
},
{
'name':'John'
'rank':4
},
{
'name':'John'
'rank':5
},
{
'name':'James'
'rank':3
},
{
'name':'Froggy'
'rank':5
},

How would I go about finding a way to calculate the percentile that users achieve a specific rarity compared to others. Example: James falls in the top 10% of users who own rank 5.

CodePudding user response:

emphasized textYou can do something like:

db.collection.aggregate([
  {
    $group: {
      _id: "$name",
      rankFiveCount: {$sum: {$cond: [{$eq: ["$rank", 5]}, 1, 0]}}}
  },
  {
    $setWindowFields: {
      sortBy: {rankFiveCount: -1},
      output: {
        rank: {$rank: {}},
        totalCount: {$count: {}}
      }
    }
  },
  {$match: {_id: wantedUser}},
  {
    $project: {
      name: "$_id",
      _id: 0,
      percentile: {
        $round: {$multiply: [{$divide: ["$rank", "$totalCount"]}, 100]}
      }
    }
  }
])

See how it works on the playground example

CodePudding user response:

I think the formula for that would be (no_of_people_with_rank_5/total_people)*100. And honestly, it also depends about the logic in database, is it that a person with rank 5 holds all other ranks or are the ranks all separate? For this answer I will assume all ranks are separate.

import json

database = json.load(open("Yourdatabasefile.json", "r"))
# Skip above, if you already have the data

def find_userbase_percentiles(database: dict) -> dict[float]:
    # You would ideally want to call this function once.
    frequencies = {}
    for x in database:
        frequencies.setdefault(x['rank'], 0)
        frequencies[x['rank']]  = 1
    # Now we have frequencies
    for k, v in frequencies.items():
        frequencies[k] = (v/len(database))*100  # Our percentile formula
    return frequencies

def find_user_qualifications(user: str, rank: int, frequencies: dict) -> str:
    # Modify this function to your requirements
    return f"{user} falls in the top {int(frequencies[rank])}% of users who own rank {rank}"

if __name__ == "__main__":
    freq = find_userbase_percentile(database)
    print(find_user_qualifications("James", 5, freq))
  • Related