IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed. Works for first-CodePudding

Let me start by saying that I know this error message has posts about it, but I'm not sure what's wrong with my code. The block of code works just fine for the first two loops, but then fails. I've even tried removing the first two loops from the data to rule out issues in the 3rd loop, but no luck. I did have it set to print out the unsorted temporary list, and it just prints an empty array for the 3rd loop.

Sorry for the wall of comments in my code, but I'd rather have each line commented than cause confusion over what I'm trying to accomplish.

TL;DR: I'm trying to find and remove outliers from a list of data, but only for groups of entries that have the same number in column 0.

Pastebin with data

import numpy as np, csv, multiprocessing as mp, mysql.connector as msc, pandas as pd
import datetime 

#Declare unsorted data array
d_us = []
#Declare temporary array for use in loop
tmp = []
#Declare sorted data array
d = []
#Declare Sum variable
tot = 0
#Declare Mean variable
m = 0
#declare sorted final array
sort = []
#Declare number of STDs
t = 1
#Declare Standard Deviation variable
std = 0
#Declare z-score variable
z_score

#Timestamp for output files
nts = datetime.datetime.now().timestamp()

#Create output file
with open(f"calib_temp-{nts}.csv", 'w') as ctw:
    pass
    
#Read data from CSV
with open("test.csv", 'r', newline='') as drh:
    fr_rh = csv.reader(drh, delimiter=',')
    for row in fr_rh:
        #append data to unsorted array
        d_us.append([float(row[0]),float(row[1])])

#Sort array by first column
d = np.sort(d_us)

#Calculate the range of the data
l = round((d[-1][0] - d[0][0]) * 10)

#Declare the starting value
s = d[0][0]
#Declare the ending value
e = d[-1][0]
#Set the while loop counter
n = d[0][0]

#Iterate through data
while n <= e:   
    #Create array with difference column
    for row in d:
        if row[0] == n:
            diff = round(row[0] - row[1], 1)
            tmp.append([row[0],row[1],diff])    
    #Convert to numpy array
    tmp = np.array(tmp)
    #Sort numpy array
    sort = tmp[np.argsort(tmp[:,2])]
    #Calculate sum of differences
    for row in tmp:
        tot = tot   row[2] 
    #Calculate mean
    m = np.mean(tot)
    #Calculate Standard Deviation
    std = np.std(tmp[:,2])
    #Calculate outliers and write to output file
    for y in tmp:
        z_score = (y[2] - m)/std
        if np.abs(z_score) > t:
            with open(f"calib_temp-{nts}.csv", 'a', newline='') as ct:
                c = csv.writer(ct, delimiter = ',')
                c.writerow([y[0],y[1]])
    #Reset Variables
    tot = 0
    m = 0
    n = n   0.1
    tmp = []
    std = 0
    z_score = 0

CodePudding user response：

Do this before the loop:

#Create output file
ct = open(f"calib_temp-{nts}.csv", 'w') 
c = csv.writer(ct, delimiter = ',')

Then change the loop to this. Note that I have moved your initializations to the top of the loop, so you don't need to initialize them twice. Note the if tmp: line, which solves the numpy exception.

#Iterate through data
while n <= e:   
    tot = 0
    m = 0
    tmp = []
    std = 0
    z_score = 0

    #Create array with difference column
    for row in d:
        if row[0] == n:
            diff = round(row[0] - row[1], 1)
            tmp.append([row[0],row[1],diff])    
    #Sort numpy array
    if tmp:
        #Convert to numpy array
        tmp = np.array(tmp)
        sort = tmp[np.argsort(tmp[:,2])]
        #Calculate sum of differences
        for row in tmp:
            tot = tot   row[2] 
        #Calculate mean
        m = np.mean(tot)
        #Calculate Standard Deviation
        std = np.std(tmp[:,2])
        #Calculate outliers and write to output file
        for y in tmp:
            z_score = (y[2] - m)/std
            if np.abs(z_score) > t:
                c.writerow([y[0],y[1]])
    #Reset Variables
    n = n   0.1