Most effective way to get data from iterating JSON using python?-CodePudding

I am working with a lot of JSON data. Two in particular look like this:

users = [
  { "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
  { "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
  { "id": 7, "name": "Jack Snow", "roles": ["rider"] },
  { "id": 11, "name": "NA", "roles": [] },
  { "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]

And this:

users2 = [
      {
    'id': 1,
    'name': 'Employee #1',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 11, 4),

    'deactivated_on': datetime.date(2019, 1, 10)
  },
  {
    'id': 2,
    'name': 'Employee #2',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 12, 4),

    'deactivated_on': None
  }
]

I need to know how to effectively iterate through them and perform some calculations.

For the first JSON how do I iterate the list of dictionaries and pull out only the 'name' of users who have the 'roles' rider in their values list using python?

For the second JSON, assuming that as the basic structure, I want to calculate a daily rate for people with active subscriptions for each day of the month. I want to identify which users were active that day and then multiple the number of other active users for that day to calculate the total for the day. The subscription is $4/month, so it would look something like for a day:

2019-01-01  2 active users * $0.129032258 = $0.258064516  (subtotal: $0.258064516)

And calculate a total for the entire month.

The users2 may also be empty so I need to handle this case.

For the first one I tried something like this:

for d in users:
    if 'rider' in d['roles']:
        print(d['name'])

Seems to work but not sure if there is a better way to go about it. For the second part I am truly lost on how to go about it.

Please help Thanks

CodePudding user response：

For first file your solution seems OK and it doesn't need changes.

Eventually you can write it as list comprehension (but you don't have to)

selected = [person['name'] for person in users if 'rider' in person['roles']]        

for name in selected:        
    print(name)

Full working code

users = [
  { "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
  { "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
  { "id": 7, "name": "Jack Snow", "roles": ["rider"] },
  { "id": 11, "name": "NA", "roles": [] },
  { "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]

#selected = []

#for person in users:
#    if 'rider' in person['roles']:
#        selected.append(person['name'])
        
selected = [person['name'] for person in users if 'rider' in person['roles']]        

#print(selected)        

for name in selected:        
    print(name)

The same with pandas

import pandas as pd

users = [
  { "id": 1, "name": "Greg Harris", "roles": ["mega-user"] },
  { "id": 2, "name": "Sarah Smith", "roles": ["charger", "rider"] },
  { "id": 7, "name": "Jack Snow", "roles": ["rider"] },
  { "id": 11, "name": "NA", "roles": [] },
  { "id": 18, "name": "Tiffany Denson", "roles": ["beta tester"] },
]

df = pd.DataFrame(users)
print('\n--- dataframe ---\n')
print(df)

mask = df['roles'].apply(lambda x: 'rider' in x)
print('\n--- mask ---\n')
print(mask)

selected = df[ mask ]
print('\n--- selected ---\n')
print(selected['name'])

Result:

--- dataframe ---

   id            name             roles
0   1     Greg Harris       [mega-user]
1   2     Sarah Smith  [charger, rider]
2   7       Jack Snow           [rider]
3  11              NA                []
4  18  Tiffany Denson     [beta tester]

--- mask ---

0    False
1     True
2     True
3    False
4    False
Name: roles, dtype: bool

--- selected ---

1    Sarah Smith
2      Jack Snow
Name: name, dtype: object

Second file may need nested for-loop because it has to run for different days, and evey day check with all users.

import datetime

users2 = [
      {
    'id': 1,
    'name': 'Employee #1',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 11, 4),

    'deactivated_on': datetime.date(2019, 1, 10)
  },
  {
    'id': 2,
    'name': 'Employee #2',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 12, 4),

    'deactivated_on': None
  }
]

date = datetime.date.today()           # 2021-11-16
one_day = datetime.timedelta(days=1)

# --- date 2018-12-31 better for tests ---
date -= 1051 * one_day   # 2018-12-31
print(date)
      
price = 0.129032258  # $4 / 31days
subtotal = 0

for x in range(31):

    count = 0        # count persons
    date  = one_day  # get next date

    # check every person

    for person in users2:
        if (person['activated_on'] < date) and (person['deactivated_on'] is None or person['deactivated_on'] > date):
            count  = 1

    # display result for one date
    
    total = count * price
    subtotal  = total

    print(f'{date} | {count:2} active users * ${price:.2f} = {total:.2f} (subtotal: {subtotal:.2f})')

Result:

2018-12-31
2019-01-01 |  2 active users * $0.13 = 0.26 (subtotal: 0.26)
2019-01-02 |  2 active users * $0.13 = 0.26 (subtotal: 0.52)
2019-01-03 |  2 active users * $0.13 = 0.26 (subtotal: 0.77)
2019-01-04 |  2 active users * $0.13 = 0.26 (subtotal: 1.03)
2019-01-05 |  2 active users * $0.13 = 0.26 (subtotal: 1.29)
2019-01-06 |  2 active users * $0.13 = 0.26 (subtotal: 1.55)
2019-01-07 |  2 active users * $0.13 = 0.26 (subtotal: 1.81)
2019-01-08 |  2 active users * $0.13 = 0.26 (subtotal: 2.06)
2019-01-09 |  2 active users * $0.13 = 0.26 (subtotal: 2.32)
2019-01-10 |  1 active users * $0.13 = 0.13 (subtotal: 2.45)
2019-01-11 |  1 active users * $0.13 = 0.13 (subtotal: 2.58)
2019-01-12 |  1 active users * $0.13 = 0.13 (subtotal: 2.71)
2019-01-13 |  1 active users * $0.13 = 0.13 (subtotal: 2.84)
2019-01-14 |  1 active users * $0.13 = 0.13 (subtotal: 2.97)
2019-01-15 |  1 active users * $0.13 = 0.13 (subtotal: 3.10)
2019-01-16 |  1 active users * $0.13 = 0.13 (subtotal: 3.23)
2019-01-17 |  1 active users * $0.13 = 0.13 (subtotal: 3.35)
2019-01-18 |  1 active users * $0.13 = 0.13 (subtotal: 3.48)
2019-01-19 |  1 active users * $0.13 = 0.13 (subtotal: 3.61)
2019-01-20 |  1 active users * $0.13 = 0.13 (subtotal: 3.74)
2019-01-21 |  1 active users * $0.13 = 0.13 (subtotal: 3.87)
2019-01-22 |  1 active users * $0.13 = 0.13 (subtotal: 4.00)
2019-01-23 |  1 active users * $0.13 = 0.13 (subtotal: 4.13)
2019-01-24 |  1 active users * $0.13 = 0.13 (subtotal: 4.26)
2019-01-25 |  1 active users * $0.13 = 0.13 (subtotal: 4.39)
2019-01-26 |  1 active users * $0.13 = 0.13 (subtotal: 4.52)
2019-01-27 |  1 active users * $0.13 = 0.13 (subtotal: 4.65)
2019-01-28 |  1 active users * $0.13 = 0.13 (subtotal: 4.77)
2019-01-29 |  1 active users * $0.13 = 0.13 (subtotal: 4.90)
2019-01-30 |  1 active users * $0.13 = 0.13 (subtotal: 5.03)
2019-01-31 |  1 active users * $0.13 = 0.13 (subtotal: 5.16)

The same with pandas but it would use date_range

import pandas as pd
import datetime

users2 = [
      {
    'id': 1,
    'name': 'Employee #1',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 11, 4),

    'deactivated_on': datetime.date(2019, 1, 10)
  },
  {
    'id': 2,
    'name': 'Employee #2',
    'customer_id': 1,

    'activated_on': datetime.date(2018, 12, 4),

    'deactivated_on': None
  }
]

df = pd.DataFrame(users2)
print('\n--- dataframe ---\n')
print(df)
print()

price = 0.129032258  # 4/31  # $4 / 31days
subtotal = 0

for date in pd.date_range('2019.01.01', periods=31):

    #print('\n===== date:', date, '=====\n')
    mask1 = (df['activated_on'] < date)
    mask2 = (df['deactivated_on'].isnull())
    mask3 = (df['deactivated_on'] > date)
    #print(mask1)
    #print(mask2)
    #print(mask3)
    
    mask = mask1 & (mask2 | mask3)
    #print('\n--- mask ---\n')
    #print(mask)
    
    selected = df[ mask ]
    
    count = len(selected)
    
    total = count * price
    subtotal  = total
    
    print(f'{date.date()} | {count:2} active users * ${price:.2f} = {total:.2f} (subtotal: {subtotal:.2f})')