I was wondering if there was an actually good way to aggregate/calculate percentiles based on all data in a database. For example, data like:
{
'name':'John'
'rank':5
},
{
'name':'John'
'rank':4
},
{
'name':'John'
'rank':5
},
{
'name':'James'
'rank':3
},
{
'name':'Froggy'
'rank':5
},
How would I go about finding a way to calculate the percentile that users achieve a specific rarity compared to others. Example: James falls in the top 10% of users who own rank 5.
CodePudding user response:
emphasized textYou can do something like:
db.collection.aggregate([
{
$group: {
_id: "$name",
rankFiveCount: {$sum: {$cond: [{$eq: ["$rank", 5]}, 1, 0]}}}
},
{
$setWindowFields: {
sortBy: {rankFiveCount: -1},
output: {
rank: {$rank: {}},
totalCount: {$count: {}}
}
}
},
{$match: {_id: wantedUser}},
{
$project: {
name: "$_id",
_id: 0,
percentile: {
$round: {$multiply: [{$divide: ["$rank", "$totalCount"]}, 100]}
}
}
}
])
See how it works on the playground example
CodePudding user response:
I think the formula for that would be (no_of_people_with_rank_5/total_people)*100
. And honestly, it also depends about the logic in database, is it that a person with rank 5 holds all other ranks or are the ranks all separate? For this answer I will assume all ranks are separate.
import json
database = json.load(open("Yourdatabasefile.json", "r"))
# Skip above, if you already have the data
def find_userbase_percentiles(database: dict) -> dict[float]:
# You would ideally want to call this function once.
frequencies = {}
for x in database:
frequencies.setdefault(x['rank'], 0)
frequencies[x['rank']] = 1
# Now we have frequencies
for k, v in frequencies.items():
frequencies[k] = (v/len(database))*100 # Our percentile formula
return frequencies
def find_user_qualifications(user: str, rank: int, frequencies: dict) -> str:
# Modify this function to your requirements
return f"{user} falls in the top {int(frequencies[rank])}% of users who own rank {rank}"
if __name__ == "__main__":
freq = find_userbase_percentile(database)
print(find_user_qualifications("James", 5, freq))