Home > Enterprise >  Pandas, Python - Assembling a Data Frame with multiple lists from loop
Pandas, Python - Assembling a Data Frame with multiple lists from loop

Time:03-11

Using loop to collect target data into lists from JSON file. These lists are organized as columns and their values are organized; thus, no manipulation/reorganization is required. Only attaching them horizontally.

#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
    data = json.load(f)

    specsA=('PreviousDraws',['DrawNumber'])
    draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
    print(draw)

    for j in range(0,5):
        specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
        number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
        print(number)

    #Now assembling lists into a table using pandas

The resulting lists from the code above are as followed below:

#This is from variable draw
[10346, 10345, 10344, 10343, 10342, 10341, 10340, 10339, 10338, 10337, 10336, 10335, 10334, 10333, 10332, 10331, 10330, 10329, 10328, 10327]

#This is from variable number 
['22', '9', '4', '1', '1', '14', '5', '3', '2', '8', '2', '1', '4', '9', '4', '4', '3', '13', '7', '14']
['28', '18', '16', '2', '3', '17', '16', '13', '11', '9', '8', '2', '9', '19', '7', '13', '7', '23', '21', '17']
['33', '24', '21', '4', '9', '20', '27', '19', '23', '19', '19', '7', '19', '30', '19', '27', '19', '32', '26', '21']
['35', '30', '28', '11', '21', '23', '33', '26', '35', '37', '27', '12', '20', '31', '22', '34', '22', '36', '27', '25']
['36', '32', '33', '19', '29', '38', '35', '27', '37', '38', '32', '30', '22', '36', '33', '39', '36', '38', '30', '27']

Expected Data Frame table after assembly:

Draw  |  Number[0]  |  Number[1]  |  Number[2] ...
10346 |  22         |   28        |
10345 |  9          |   18        |
10344 |  4          |   16        |
10343 |  1          |   2         |
10342 |  1          |   3         |

My attempt at assembling the table: Organize as dictionary with Series, below:

dct = {'DrawNumbers':pd.Series(draw),
        'Index1':pd.Series(number),
        'Index2':pd.Series(number),
        'Index3':pd.Series(number),
        'Index4':pd.Series(number),
        'Index5':pd.Series(number)
        }

df = pd.DataFrame(dct)
print(df)

Actual result - incorrect due to last list's value being repeated in table's row. So far, only Index5 column is correct, while all index columns are incorrectly represented with index 5's values.

    DrawNumbers Index1 Index2 Index3 Index4 Index5
0         10346     36     36     36     36     36
1         10345     32     32     32     32     32
2         10344     33     33     33     33     33
3         10343     19     19     19     19     19
4         10342     29     29     29     29     29
5         10341     38     38     38     38     38
6         10340     35     35     35     35     35
7         10339     27     27     27     27     27
8         10338     37     37     37     37     37
9         10337     38     38     38     38     38
...        ...      ...    ...    ...    ...    ...

Also had tried to change the data type of the number from string to int, but having repeated errors attempted that. Either way, I am stuck and would like to request for assistance.

CodePudding user response:

The problem is that you are overwriting the number variable in the loop, so is no longer available at the end of each iteration, I add a solution adding the column Index in each iteration.

# create an empty dataframe
df = pd.DataFrame()

#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
    data = json.load(f)

    specsA=('PreviousDraws',['DrawNumber'])
    draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
    print(draw)

    # insert the draw to the dataframe
    df['DrawNumbers'] = draw

    for j in range(0,5):
        specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
        number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
        print(number)
        # insert each number to the dataframe
        df[f'Index{j}'] = number

CodePudding user response:

Assuming that number is a nested list:

number = list(map(list, zip(*number))) # this transposes the nested list so that each list within the list now corresponds to one row of the desired df
pd.DataFrame(data=number, index=draw)

This will output the df in the desired format. Of course you can go ahead and label the columns as you like, etc.

  • Related