Home > Blockchain >  Why is my pandas doing math incorrectly? Two positives are multiplying to equal a negative
Why is my pandas doing math incorrectly? Two positives are multiplying to equal a negative

Time:10-05

This... is an odd situation I'm handling with.

So I built a function to pore through a pdf and collect info so I can build a df and do some math on it.

Everything's great. I got all the information and I built the df, but the issue is that it's not doing the math right. I'll show you the function I made although, as you can imagine, it wouldn't work since you don't have the pdf.

def pdf_open(person):
    role_list = []
    role_index = [0, 3, 6, 9, 12, 15, 18, 21, 24]
    self_list = []
    self_index = [1, 4, 7, 10, 13, 16, 19, 22, 25]
    obs_list = []
    obs_index = [2, 5, 8, 11, 14, 17, 20, 23, 26]
    with open(person, 'rb'):
        pdfReader = PyPDF2.PdfFileReader(person)
        pageObj = pdfReader.getPage(4)
        report = pageObj.extractText()
        report = report.replace('Resource Investigator','Resource-Investigator')
        report = report.replace('Completer Finisher', 'Completer-Finisher')
        report = report.replace('Monitor Evaluator','Monitor-Evaluator')
        report_list = report.split('(Percentile) (Percentile)\n')
        report_list = report_list[1][27:]
        report_list = report_list.replace('\n', ' ')
        report_list = report_list.split(' ')
        role_list = [report_list[i] for i in role_index]
        self_list = [report_list[i] for i in self_index]
        obs_list = [report_list[i] for i in obs_index]
        data = [role_list]
        col_list = ['Role', 'Self-Perception Percentile', 'Observed Percentile']
        df = pd.DataFrame(columns=col_list)
        df['Role'] = role_list
        df['Self-Perception Percentile'] = self_list
        df['Observed Percentile'] = obs_list
        df['Self-Perception Percentile'] = df['Self-Perception Percentile'].astype('int8')
        df['Observed Percentile'] = df['Observed Percentile'].astype('int8')
        df['Self-Perception Percentile'] = abs(df['Self-Perception Percentile'])
        df['Observed Percentile'] = abs(df['Observed Percentile'].astype('int8'))
        df['Weighted List'] = (df['Observed Percentile']*2)
        print(df)

Here's the result I have.

Picture with new column

It doesn't take a mathematician to know that multiplying two positives doesn't equal a negative number and the math for most of the rows are correct, but you can see the issue.

I even did little things such as getting the absolute value to ensure all values are positive.

Is there anything I should be doing that I'm not?

CodePudding user response:

Arithmetic overflow. The value 96*2 does not fit into a signed int8. You either need uint8 or a larger signed type.

  • Related