Why does having a list inside of a list give me list indices error?-CodePudding

Hey So I am new to Data Structures and wanted to try out 2D array in Pandas and am getting a error in the following code

import pandas as pd
from datetime import datetime as dt

ls = [[dt(2021, 9, 4).strftime("%d-%m-%Y"), "WTM", 62, 100, 64, 100, 86, 100, 212, 300], [dt(2021, 9, 5).strftime("%d-%m-%Y"), "WTA", 48, 60, 39, 60, 31, 60, 118, 180]
    [dt(2021, 10, 23).strftime("%d-%m-%Y"), "WTM", 7, 100, 27, 100, 47, 100, 81, 300]]

data = pd.DataFrame(ls, columns=['Exam Date', 'Exam Type', 'Maths', 'Max Marks', 'Chemistry', 'Max Marks', 'Physics', 'Max Marks', 'Total', 'Max Marks'])

The error which I revived is

TypeError: list indices must be integers or slices, not tuple

So what did I do wrong Thanks

CodePudding user response：

you've missed a , after the second array.

CodePudding user response：

Let's start from such a detail that dt is a Datetime accessor in Pandas. So it is "cleaner" when you use another name, e.g.:

from datetime import datetime as dtm

Then the only change to your source data is that I added a comma after the second row:

ls = [
    [dtm(2021,  9,  4).strftime("%d-%m-%Y"), "WTM", 62, 100, 64, 100, 86, 100, 212, 300],
    [dtm(2021,  9,  5).strftime("%d-%m-%Y"), "WTA", 48,  60, 39,  60, 31,  60, 118, 180],
    [dtm(2021, 10, 23).strftime("%d-%m-%Y"), "WTM",  7, 100, 27, 100, 47, 100,  81, 300]]

Then I used your code and got proper result.

One more remark is that you should not use repeating column names.

Otherwise, after you have named columns just like you did, when I run data[['Max Marks']] I get all 4 Max Marks columns.

Yet another hint is to use pandasonic Timestamp type, instead of datetime. You can define your source data as:

ls = [
    [pd.Timestamp(2021,  9,  4), "WTM", 62, 100, 64, 100, 86, 100, 212, 300],
    [pd.Timestamp(2021,  9,  5), "WTA", 48,  60, 39,  60, 31,  60, 118, 180],
    [pd.Timestamp(2021, 10, 23), "WTM",  7, 100, 27, 100, 47, 100,  81, 300]]

Then the printout of the DataFrame looks just the same (Exam Date printed in yyyy-mm-dd format). And the true gain is that you can use date methods to select and/or grouping of the source data.

Consider also using a MultiIndex on columns, creating the DataFrame as:

data = pd.DataFrame(ls, columns=pd.MultiIndex.from_tuples([
    ('Exam Date', ''),       ('Exam Type', ''),
    ('Maths',     'Result'), ('Maths',     'Max'),
    ('Chemistry', 'Result'), ('Chemistry', 'Max'),
    ('Physics',   'Result'), ('Physics',   'Max'),
    ('Total',     'Result'), ('Total',     'Max')]))

Then the DataFrame has "structural" column names. The first level is the subject name and the second - either the actual or Max result:

   Exam Date Exam Type  Maths      Chemistry      Physics       Total     
                       Result  Max    Result  Max  Result  Max Result  Max
0 2021-09-04       WTM     62  100        64  100      86  100    212  300
1 2021-09-05       WTA     48   60        39   60      31   60    118  180
2 2021-10-23       WTM      7  100        27  100      47  100     81  300