How to calculate min and max of a column for particular rows?-CodePudding

I have a csv file as following:

    0  2  1  1    464  385  171    0:44:4   
    1  1  2  26    254  444  525    0:56:2   
    2  3  1  90    525  785  522    0:52:8   
    3  8  2  3    525  233  555    0:52:8  
    4  7  1  10    525  433  522    1:52:8   
    5  9  2  55     525  555  522    1:52:8   
    6  6  3  3   392  111  232    1:43:4   
    7  1  4  23    322  191  112    1:43:4   
    8  1  3  30    322  191  112    1:43:4   
    9  1  5   2   322  191  112    1:43:4   
    10  1  3  22   322  191  112    1:43:4   
    11  1  4  44   322  191  112    1:43:4   
    12  1  5  1   322  191  112    1:43:4   
    12  1  4  3    322  191  112    1:43:4   
    12  1  6  33    322  191  112    1:43:4   
    12  1  6  1    322  191  112    1:43:4 
    12  1  5  3    322  191  112    1:43:4   
    12  1  6  33    322  191  112    1:43:4   
     

    .
    .

Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 1 to 6 in the third columns and find the maximum and minmum amount for each row that has number 1 to 6 seprately. For example output like this:

Mix for row with 1: 1  
Max for row with 1: 90

Min for row with 2:  3
Max for row with 2:  55

and so on

I can plot the figure using following code. How to get summary statistics by group? What I'm looking for is to get multiple statistics for the same group like mean, min, max, number of each group in one call, is that doable?

import matplotlib.pyplot as plt
import csv
x= []
y= []

with open('mydata.csv','r') as csvfile:
    ap = csv.reader(csvfile, delimiter=',')
    for row in ap:
        x.append(int(row[2]))
        y.append(int(row[7]))

plt.scatter(x, y, color = 'g',s = 4, marker='o')
plt.show()

CodePudding user response：

One easy way would be to use Pandas with read_csv(), .groupby() and .agg():

import pandas as pd

df = pd.read_csv("mydata.csv", header=None)

def min_max_avg(col):
    return (col.min()   col.max()) / 2
    
result = df[[2, 3, 4]].groupby(2).agg(["min", "max", "mean", min_max_avg])

Result:

    3                               4                             
  min max       mean min_max_avg  min  max        mean min_max_avg
2                                                                 
1   1  90  33.666667        45.5  464  525  504.666667       494.5
2   3  55  28.000000        29.0  254  525  434.666667       389.5
3   3  30  18.333333        16.5  322  392  345.333333       357.0
4   3  44  23.333333        23.5  322  322  322.000000       322.0
5   1   3   2.000000         2.0  322  322  322.000000       322.0
6   1  33  22.333333        17.0  322  322  322.000000       322.0

If you don't like that you could do it with pure Python, it's only a little bit more work:

import csv

data = {}
with open("mydata.csv", "r") as file:
    for row in csv.reader(file):
        dct = data.setdefault(row[2], {})
        for col in (3, 4):
            dct.setdefault(col, []).append(row[col])

min_str = "Min for group {} - column {}: {}"
max_str = "Max for group {} - column {}: {}"
for row in data:
    for col in (3, 4):
        print(min_str.format(row, col, min(data[row][col])))
        print(max_str.format(row, col, max(data[row][col])))

Result:

Min for group 1 - column 3: 1
Max for group 1 - column 3: 90
Min for group 1 - column 4: 464
Max for group 1 - column 4: 525
Min for group 2 - column 3: 26
Max for group 2 - column 3: 55
Min for group 2 - column 4: 254
Max for group 2 - column 4: 525
Min for group 3 - column 3: 22
Max for group 3 - column 3: 30
Min for group 3 - column 4: 322
Max for group 3 - column 4: 392
...

mydata.csv:

0,2,1,1,464,385,171,0:44:4
1,1,2,26,254,444,525,0:56:2
2,3,1,90,525,785,522,0:52:8
3,8,2,3,525,233,555,0:52:8
4,7,1,10,525,433,522,1:52:8
5,9,2,55,525,555,522,1:52:8
6,6,3,3,392,111,232,1:43:4
7,1,4,23,322,191,112,1:43:4
8,1,3,30,322,191,112,1:43:4
9,1,5,2,322,191,112,1:43:4
10,1,3,22,322,191,112,1:43:4
11,1,4,44,322,191,112,1:43:4
12,1,5,1,322,191,112,1:43:4
12,1,4,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4
12,1,6,1,322,191,112,1:43:4
12,1,5,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4