Home > Blockchain >  Django ORM Group By Field by another field
Django ORM Group By Field by another field

Time:06-04

I have SessionLog model

class SessionLog(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    ip_address = models.CharField(max_length=20, null=True)

I want to get list of ip addresses for duplicated users only. So, if different users logged in with same IP address, I want to get list of such IP addresses.

Also, I want to get users list who logged in same ip.

So expected result would be

[
  {
    "ip_address": "170.221.23.56",
    "user_id": [21, 23, 45]
  },
  {
    "ip_address": "170.221.23.58",
    "user_id": [25, 23, 45]
  },
]

How can I implement with django ORM?

CodePudding user response:

If you are using PostgreSQL you can do the following:

from django.db import models
from django.contrib.postgres.fields import ArrayField


class Array(models.Subquery):
    template = 'ARRAY(%(subquery)s)'
    output_field = ArrayField(base_field=models.TextField())

user_ids = SessionLog.objects.filter(
    ip_address=models.OuterRef("ip_address")
).values_list("user_id", flat=True)

sessions = SessionLog.objects.distict("ip_address").order_by("ip_address").annotate(
    user_ids=Array(user_ids)
)

for session in sessions:
    print(session.ip_address, session.user_ids)

If you are using another database you could do a query using prefetch_related() (Note that this will result in two queries instead of one, but that should mostly not be a real disadvantage).

from django.db.models import Count

users_in_log = User.objects.prefetch_related("session_log_set")
users_in_log = users_in_log.annotate(session_count=Count("session_log_set"))
users_in_log = users_in_log.filter(session_count__gt=0).distinct()

for user in users_in_log:
    print(user.pk, [s.ip_address for s in user.session_log_set.all()]

CodePudding user response:

You can first convert the querysets into lists:

ip_list = list(SessionLog.objects.values_list('ip_address'))
user_list = list(SessionLog.objects.values_list('user'))

then create another list to identify unique ip addresses:

unique_addresses = list(set(ip_list))

finally use enumerate to get the index of the ip addresses and then use those indices to get the corresponding values of the users like so:

Dict = {} 
users = [] 
for n in unique_addresses: 
    indices = [i for i, x in enumerate(ip_list) if x == n] 
        for p in indices: 
            users.append(user_list[p]) 
    Dict[n] = status 
    status = [] 
print(Dict)

The output format is a bit untidy but you can access the users of a specific ip address using Dict['ip_address']

Another way would be to loop over each entry from the database and perform the sorting on the fly but that would involve multiple requests which may impact performance

  • Related