Hey So I am new to Data Structures and wanted to try out 2D array in Pandas and am getting a error in the following code
import pandas as pd
from datetime import datetime as dt
ls = [[dt(2021, 9, 4).strftime("%d-%m-%Y"), "WTM", 62, 100, 64, 100, 86, 100, 212, 300], [dt(2021, 9, 5).strftime("%d-%m-%Y"), "WTA", 48, 60, 39, 60, 31, 60, 118, 180]
[dt(2021, 10, 23).strftime("%d-%m-%Y"), "WTM", 7, 100, 27, 100, 47, 100, 81, 300]]
data = pd.DataFrame(ls, columns=['Exam Date', 'Exam Type', 'Maths', 'Max Marks', 'Chemistry', 'Max Marks', 'Physics', 'Max Marks', 'Total', 'Max Marks'])
The error which I revived is
TypeError: list indices must be integers or slices, not tuple
So what did I do wrong Thanks
CodePudding user response:
you've missed a ,
after the second array.
CodePudding user response:
Let's start from such a detail that dt is a Datetime accessor in Pandas. So it is "cleaner" when you use another name, e.g.:
from datetime import datetime as dtm
Then the only change to your source data is that I added a comma after the second row:
ls = [
[dtm(2021, 9, 4).strftime("%d-%m-%Y"), "WTM", 62, 100, 64, 100, 86, 100, 212, 300],
[dtm(2021, 9, 5).strftime("%d-%m-%Y"), "WTA", 48, 60, 39, 60, 31, 60, 118, 180],
[dtm(2021, 10, 23).strftime("%d-%m-%Y"), "WTM", 7, 100, 27, 100, 47, 100, 81, 300]]
Then I used your code and got proper result.
One more remark is that you should not use repeating column names.
Otherwise, after you have named columns just like you did,
when I run data[['Max Marks']]
I get all 4 Max Marks columns.
Yet another hint is to use pandasonic Timestamp type, instead of datetime. You can define your source data as:
ls = [
[pd.Timestamp(2021, 9, 4), "WTM", 62, 100, 64, 100, 86, 100, 212, 300],
[pd.Timestamp(2021, 9, 5), "WTA", 48, 60, 39, 60, 31, 60, 118, 180],
[pd.Timestamp(2021, 10, 23), "WTM", 7, 100, 27, 100, 47, 100, 81, 300]]
Then the printout of the DataFrame looks just the same (Exam Date printed in yyyy-mm-dd format). And the true gain is that you can use date methods to select and/or grouping of the source data.
Consider also using a MultiIndex on columns, creating the DataFrame as:
data = pd.DataFrame(ls, columns=pd.MultiIndex.from_tuples([
('Exam Date', ''), ('Exam Type', ''),
('Maths', 'Result'), ('Maths', 'Max'),
('Chemistry', 'Result'), ('Chemistry', 'Max'),
('Physics', 'Result'), ('Physics', 'Max'),
('Total', 'Result'), ('Total', 'Max')]))
Then the DataFrame has "structural" column names. The first level is the subject name and the second - either the actual or Max result:
Exam Date Exam Type Maths Chemistry Physics Total
Result Max Result Max Result Max Result Max
0 2021-09-04 WTM 62 100 64 100 86 100 212 300
1 2021-09-05 WTA 48 60 39 60 31 60 118 180
2 2021-10-23 WTM 7 100 27 100 47 100 81 300