I am working with a lot of JSON data. Two in particular look like this:
users = [
{ "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
{ "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
{ "id": 7, "name": "Jack Snow", "roles": ["rider"] },
{ "id": 11, "name": "NA", "roles": [] },
{ "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]
And this:
users2 = [
{
'id': 1,
'name': 'Employee #1',
'customer_id': 1,
'activated_on': datetime.date(2018, 11, 4),
'deactivated_on': datetime.date(2019, 1, 10)
},
{
'id': 2,
'name': 'Employee #2',
'customer_id': 1,
'activated_on': datetime.date(2018, 12, 4),
'deactivated_on': None
}
]
I need to know how to effectively iterate through them and perform some calculations.
For the first JSON how do I iterate the list of dictionaries and pull out only the 'name' of users who have the 'roles' rider in their values list using python?
For the second JSON, assuming that as the basic structure, I want to calculate a daily rate for people with active subscriptions for each day of the month. I want to identify which users were active that day and then multiple the number of other active users for that day to calculate the total for the day. The subscription is $4/month, so it would look something like for a day:
2019-01-01 2 active users * $0.129032258 = $0.258064516 (subtotal: $0.258064516)
And calculate a total for the entire month.
The users2 may also be empty so I need to handle this case.
For the first one I tried something like this:
for d in users:
if 'rider' in d['roles']:
print(d['name'])
Seems to work but not sure if there is a better way to go about it. For the second part I am truly lost on how to go about it.
Please help Thanks
CodePudding user response:
For first file your solution seems OK and it doesn't need changes.
Eventually you can write it as list comprehension (but you don't have to)
selected = [person['name'] for person in users if 'rider' in person['roles']]
for name in selected:
print(name)
Full working code
users = [
{ "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
{ "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
{ "id": 7, "name": "Jack Snow", "roles": ["rider"] },
{ "id": 11, "name": "NA", "roles": [] },
{ "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]
#selected = []
#for person in users:
# if 'rider' in person['roles']:
# selected.append(person['name'])
selected = [person['name'] for person in users if 'rider' in person['roles']]
#print(selected)
for name in selected:
print(name)
The same with pandas
import pandas as pd
users = [
{ "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
{ "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
{ "id": 7, "name": "Jack Snow", "roles": ["rider"] },
{ "id": 11, "name": "NA", "roles": [] },
{ "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]
df = pd.DataFrame(users)
print('\n--- dataframe ---\n')
print(df)
mask = df['roles'].apply(lambda x: 'rider' in x)
print('\n--- mask ---\n')
print(mask)
selected = df[ mask ]
print('\n--- selected ---\n')
print(selected['name'])
Result:
--- dataframe ---
id name roles
0 1 Greg Harris [mega-user]
1 2 Sarah Smith [charger, rider]
2 7 Jack Snow [rider]
3 11 NA []
4 18 Tiffany Denson [beta tester]
--- mask ---
0 False
1 True
2 True
3 False
4 False
Name: roles, dtype: bool
--- selected ---
1 Sarah Smith
2 Jack Snow
Name: name, dtype: object
Second file may need nested for
-loop because it has to run for different days, and evey day check with all users.
import datetime
users2 = [
{
'id': 1,
'name': 'Employee #1',
'customer_id': 1,
'activated_on': datetime.date(2018, 11, 4),
'deactivated_on': datetime.date(2019, 1, 10)
},
{
'id': 2,
'name': 'Employee #2',
'customer_id': 1,
'activated_on': datetime.date(2018, 12, 4),
'deactivated_on': None
}
]
date = datetime.date.today() # 2021-11-16
one_day = datetime.timedelta(days=1)
# --- date 2018-12-31 better for tests ---
date -= 1051 * one_day # 2018-12-31
print(date)
price = 0.129032258 # $4 / 31days
subtotal = 0
for x in range(31):
count = 0 # count persons
date = one_day # get next date
# check every person
for person in users2:
if (person['activated_on'] < date) and (person['deactivated_on'] is None or person['deactivated_on'] > date):
count = 1
# display result for one date
total = count * price
subtotal = total
print(f'{date} | {count:2} active users * ${price:.2f} = {total:.2f} (subtotal: {subtotal:.2f})')
Result:
2018-12-31
2019-01-01 | 2 active users * $0.13 = 0.26 (subtotal: 0.26)
2019-01-02 | 2 active users * $0.13 = 0.26 (subtotal: 0.52)
2019-01-03 | 2 active users * $0.13 = 0.26 (subtotal: 0.77)
2019-01-04 | 2 active users * $0.13 = 0.26 (subtotal: 1.03)
2019-01-05 | 2 active users * $0.13 = 0.26 (subtotal: 1.29)
2019-01-06 | 2 active users * $0.13 = 0.26 (subtotal: 1.55)
2019-01-07 | 2 active users * $0.13 = 0.26 (subtotal: 1.81)
2019-01-08 | 2 active users * $0.13 = 0.26 (subtotal: 2.06)
2019-01-09 | 2 active users * $0.13 = 0.26 (subtotal: 2.32)
2019-01-10 | 1 active users * $0.13 = 0.13 (subtotal: 2.45)
2019-01-11 | 1 active users * $0.13 = 0.13 (subtotal: 2.58)
2019-01-12 | 1 active users * $0.13 = 0.13 (subtotal: 2.71)
2019-01-13 | 1 active users * $0.13 = 0.13 (subtotal: 2.84)
2019-01-14 | 1 active users * $0.13 = 0.13 (subtotal: 2.97)
2019-01-15 | 1 active users * $0.13 = 0.13 (subtotal: 3.10)
2019-01-16 | 1 active users * $0.13 = 0.13 (subtotal: 3.23)
2019-01-17 | 1 active users * $0.13 = 0.13 (subtotal: 3.35)
2019-01-18 | 1 active users * $0.13 = 0.13 (subtotal: 3.48)
2019-01-19 | 1 active users * $0.13 = 0.13 (subtotal: 3.61)
2019-01-20 | 1 active users * $0.13 = 0.13 (subtotal: 3.74)
2019-01-21 | 1 active users * $0.13 = 0.13 (subtotal: 3.87)
2019-01-22 | 1 active users * $0.13 = 0.13 (subtotal: 4.00)
2019-01-23 | 1 active users * $0.13 = 0.13 (subtotal: 4.13)
2019-01-24 | 1 active users * $0.13 = 0.13 (subtotal: 4.26)
2019-01-25 | 1 active users * $0.13 = 0.13 (subtotal: 4.39)
2019-01-26 | 1 active users * $0.13 = 0.13 (subtotal: 4.52)
2019-01-27 | 1 active users * $0.13 = 0.13 (subtotal: 4.65)
2019-01-28 | 1 active users * $0.13 = 0.13 (subtotal: 4.77)
2019-01-29 | 1 active users * $0.13 = 0.13 (subtotal: 4.90)
2019-01-30 | 1 active users * $0.13 = 0.13 (subtotal: 5.03)
2019-01-31 | 1 active users * $0.13 = 0.13 (subtotal: 5.16)
The same with pandas
but it would use date_range
import pandas as pd
import datetime
users2 = [
{
'id': 1,
'name': 'Employee #1',
'customer_id': 1,
'activated_on': datetime.date(2018, 11, 4),
'deactivated_on': datetime.date(2019, 1, 10)
},
{
'id': 2,
'name': 'Employee #2',
'customer_id': 1,
'activated_on': datetime.date(2018, 12, 4),
'deactivated_on': None
}
]
df = pd.DataFrame(users2)
print('\n--- dataframe ---\n')
print(df)
print()
price = 0.129032258 # 4/31 # $4 / 31days
subtotal = 0
for date in pd.date_range('2019.01.01', periods=31):
#print('\n===== date:', date, '=====\n')
mask1 = (df['activated_on'] < date)
mask2 = (df['deactivated_on'].isnull())
mask3 = (df['deactivated_on'] > date)
#print(mask1)
#print(mask2)
#print(mask3)
mask = mask1 & (mask2 | mask3)
#print('\n--- mask ---\n')
#print(mask)
selected = df[ mask ]
count = len(selected)
total = count * price
subtotal = total
print(f'{date.date()} | {count:2} active users * ${price:.2f} = {total:.2f} (subtotal: {subtotal:.2f})')