How to return all dicts with matching values from within a list of dicts-CodePudding

I looked around for similar questions since this seems pretty basic, but was unable to find anything. If there is already something out there, sorry for making a new question!

I am struggling to think of a solution to my problem:

I have a list of dicts:

[{'name':'Josh', 'age':'39','Date of Birth':'1983-02-22','Time of Birth':'11:25:03'},
{'name':'Tyrell', 'age':'24', 'Date of Birth':'1998-01-27','Time of Birth':'01:23:54'},
{'name':'Jannell', 'age':'39', 'Date of Birth':'1983-02-27','Time of Birth':'11:21:34'},
{'name':'David', 'age':'24', 'Date of Birth':'1998-01-20','Time of Birth':'01:27:24'},
{'name':'Matthew', 'age':'24','Date of Birth':'1998-03-31','Time of Birth':'01:26:41'},
{'name':'Tylan', 'age':'24','Date of Birth':'1998-01-22','Time of Birth':'01:23:16'}
]

And from that list I'd like to extract all the name key values of dicts that share the exact same age, a date of birth within 10 days from eachother and time of birth within 10 minutes from eachother. So from the above:

for age 39: [Josh,Jannell] or for age:24 [Tyrell,David] or [] for any other ages.

I definitely think I could figure it out on my own if I were shown how to successfully extract any one of these cases.

My attempt at solution

My current attempt looks like this:

#dicts = above dict from question
ages = [d['age'] for d in dicts]
ages = list(set(ages))

groupedlist = []
for age in ages:
    sameagelist = []
    for dict_ in [x for x in dicts if x['age'] == ages]:

        sameagelist.append(dict_)
    groupedlist.append(sameagelist)    

return groupedlist

Though this is proving pretty cumbersome, since now I just have a list of lists with dicts, which is seeming more difficult/more involved for the next step, when I need to filter the Times of Birth/Dates of Birth.

I'm stumped, but I can feel that the answer will be quite simple. Thanks to anyone who provides that nudge that will push me over the edge!

CodePudding user response：

If I'm not mistaken according to the conditions you set "share the exact same age, a date of birth within 10 days from eachother and time of birth within 10 minutes from eachother" and the data you provided, 'Tyrell', 'David' and 'Tylan' should be in the same group.

There might be cases though where Tyrell is born 9 days before David, and 9 days after Tylan, meaning that the couple Tylan and David do not fit the requirement.

An idea could be to have a group for each person. The following code outputs:

[['Josh', 'Jannell'], ['Tyrell', 'David', 'Tylan'], ['David', 'Tylan']]

where the first name of each sublist is the "focus/primary" person of the group. This means that, when looking at the group ['Tyrell', 'David', 'Tylan'], David and Tylan are within the boundaries of Tyrell. to know if David and Tylan are within eachother's boundaries, either one needs to be the focus, hence the second group.

To make computation easier i used:

pandas (library to work with data in table-like structure: https://pandas.pydata.org/docs/)
datetime (module to facilitate date/time operations: https://docs.python.org/3/library/datetime.html)

import pandas as pd 
import datetime

dicts = [{'name':'Josh', 'age':'39','Date of Birth':'1983-02-22','Time of Birth':'11:25:03'},
{'name':'Tyrell', 'age':'24', 'Date of Birth':'1998-01-27','Time of Birth':'01:23:54'},
{'name':'Jannell', 'age':'39', 'Date of Birth':'1983-02-27','Time of Birth':'11:21:34'},
{'name':'David', 'age':'24', 'Date of Birth':'1998-01-20','Time of Birth':'01:27:24'},
{'name':'Matthew', 'age':'24','Date of Birth':'1998-03-31','Time of Birth':'01:26:41'},
{'name':'Tylan', 'age':'24','Date of Birth':'1998-01-22','Time of Birth':'01:23:16'}
]

#create dataframe
df = pd.DataFrame().append([i for i in dicts], ignore_index=True)

#convert strings to datetime formats for easy date calculations
df["Date of Birth"] = pd.to_datetime(df["Date of Birth"], format="%Y-%m-%d")
df["Time of Birth"] = pd.to_datetime(df["Time of Birth"], format="%H:%M:%S") #ignore the fact that the same date incorrect is imputed, we only need the time

# function that checks conditions
# row: [name, age, date, time]
def check_birth(row1, row2): #returns true if all conditions are met
    delta_days = abs(row1[2] - row2[2])
    delta_minutes = row1[3] - row2[3]
    
    #no need to check age since it is done in the Date of Birth check
    if delta_days<datetime.timedelta(days=10) and delta_minutes<datetime.timedelta(minutes=10): 
        return True
    else: 
        return False

groups = [] #keep track of groups

#for each member check if other members meet the condition
for i in range(df.shape[0]): 
    track = [df.iloc[i,0]]
    for j in range(i 1, df.shape[0]):  #loop starting at i 1 to avoid duplicate groups 
        if check_birth(df.iloc[i,:], df.iloc[j,:]): 
            track.append(df.iloc[j, 0])
    if len(track) >1: groups.append(track) #exclude groups of one member

print(groups)

CodePudding user response：

To group by age, you can create a dictonary of lists and set the age as key.

from collections import defaultdict

grouped_by_age = defaultdict(list)

for item in dicts:
    grouped_by_age[item['age']].append(item['name'])

print(grouped_by_age)

CodePudding user response：

I am not sure if you were also asking for the complete solution but here it is, explanation is in code comments, it should also account for those cases of ['Alan', 'Betty'], ['Betty', 'Cooper']:

# importing all the necessary modules
import operator
import itertools
import datetime


data = [
    {'name': 'Josh', 'age': '39', 'Date of Birth': '1983-02-22', 'Time of Birth': '11:25:03'},
    {'name': 'Tyrell', 'age': '24', 'Date of Birth': '1998-01-27', 'Time of Birth': '01:23:54'},
    {'name': 'Jannell', 'age': '39', 'Date of Birth': '1983-02-27', 'Time of Birth': '11:21:34'},
    {'name': 'David', 'age': '24', 'Date of Birth': '1998-01-20', 'Time of Birth': '01:27:24'},
    {'name': 'Matthew', 'age': '24', 'Date of Birth': '1998-03-31', 'Time of Birth': '01:26:41'},
    {'name': 'Tylan', 'age': '24', 'Date of Birth': '1998-01-22', 'Time of Birth': '01:23:16'}
]

# creating a key for sorting, basically it will first sort by age, then by date, then by time
key = operator.itemgetter('age', 'Date of Birth', 'Time of Birth')
data = sorted(data, key=key)


# a convenience function to get a person's date and time of birth as a datetime object
# for time manipulations such as subtraction
def get_datetime(p):
    iso_format = f'{p["Date of Birth"]}T{p["Time of Birth"]}'
    t = datetime.datetime.fromisoformat(iso_format)
    return t


# going over the grouped list by age
for age, group in itertools.groupby(data, key=operator.itemgetter('age')):
    print(f'Age: {age}')
    # convert generator to a list to not exhaust it
    group = list(group)
    previous_match = [None]
    # going over the group while also keeping the current index for later use
    for index, person in enumerate(group):
        # creating a list of people that match the conditions of days and minutes
        # and adding the current person as the first item there
        match = [person['name']]
        time1 = get_datetime(person)
        # going over the group starting from the next person to check if they
        # match that condition of days and minutes
        for other_person in itertools.islice(group, index   1, None):
            time2 = get_datetime(other_person)
            # subtracting time of both people
            delta = time2 - time1
            # checking if they are in the ten day range and if they are in the ten minute range
            if delta.days <= 10 and (delta.seconds <= 10 * 60 or 24 * 3600 - delta.seconds <= 10 * 60):
                # if they match the conditions of days and minutes append them to the match
                match.append(other_person['name'])
        # check if any other person got matched and check if any new person has appeared
        # this is to check for that case of [Alan, Betty], [Betty, Cooper]
        if len(match) > 1 and match[-1] != previous_match[-1]:
            previous_match = match
            print(match)

Some resources (all of the below libraries are built-in):

CodePudding user response：

After making a set of ages, you can group people of the same age. Then, you need to iterate on each member of each group and find other people in the same group that match your condition:

from datetime import datetime
dicts = [{'name':'Josh', 'age':'39','Date of Birth':'1983-02-22','Time of Birth':'11:25:03'},
{'name':'Tyrell', 'age':'24', 'Date of Birth':'1998-01-27','Time of Birth':'01:23:54'},
{'name':'Jannell', 'age':'39', 'Date of Birth':'1983-02-27','Time of Birth':'11:21:34'},
{'name':'David', 'age':'24', 'Date of Birth':'1998-01-20','Time of Birth':'01:27:24'},
{'name':'Matthew', 'age':'24','Date of Birth':'1998-03-31','Time of Birth':'01:26:41'},
{'name':'Tylan', 'age':'24','Date of Birth':'1998-01-22','Time of Birth':'01:23:16'}
]
ages = set([d['age'] for d in dicts])
grouped_list = [[each_person for each_person in dicts if each_person['age'] == each_age] for each_age in ages]
grouped_people = []
for each_group in grouped_list:
    for each_person in each_group:
        new_group_people = [each_one['name'] for each_one in each_group if abs(datetime.strptime(each_one['Date of Birth'], '%Y-%m-%d') - datetime.strptime(each_person['Date of Birth'], '%Y-%m-%d')).days <= 10 and abs(datetime.strptime(each_one['Time of Birth'], '%H:%M:%S') - datetime.strptime(each_person['Time of Birth'], '%H:%M:%S')).seconds <= 10*60]
        if len(new_group_people) > 1 and new_group_people not in grouped_people:
            grouped_people.append(new_group_people)

You can also expand a loop, if it is more understandable for you:

ages = set([d['age'] for d in dicts])
grouped_list = [[each_person for each_person in dicts if each_person['age'] == each_age] for each_age in ages]
grouped_people = []
for each_group in grouped_list:
    for each_person in each_group:
        #new_group_people = [each_one['name'] for each_one in each_group if abs((datetime.strptime(each_one['Date of Birth'], '%Y-%m-%d') - datetime.strptime(each_person['Date of Birth'], '%Y-%m-%d')).days) <= 10 and abs(datetime.strptime(each_one['Time of Birth'], '%H:%M:%S') - datetime.strptime(each_person['Time of Birth'], '%H:%M:%S')).seconds <= 10*60]
        new_group_people = []
        for each_one in each_group:
            if abs((datetime.strptime(each_one['Date of Birth'], '%Y-%m-%d') - datetime.strptime(each_person['Date of Birth'], '%Y-%m-%d')).days) <= 10 and abs(datetime.strptime(each_one['Time of Birth'], '%H:%M:%S') - datetime.strptime(each_person['Time of Birth'], '%H:%M:%S')).seconds <= 10*60:
               new_group_people.append(each_one['name'])
        if len(new_group_people) > 1 and new_group_people not in grouped_people:
            grouped_people.append(new_group_people)
print(grouped_people)

The output:

[['Tyrell', 'David', 'Tylan'], ['Josh', 'Jannell']]