Home > Enterprise >  CSV Calculation without any libraries on Python
CSV Calculation without any libraries on Python

Time:11-15

I have csv example like this :

ID,TASK1,TASK2,QUIZ1,QUIZ2
11061,50,75,50,78
11062,70,80,60,50
11063,60,75,77,79
11064,52,85,50,80
11065,70,85,50,80

How do i get a new row of NO and get a max and min score on TASK1,TASK2,QUIZ1,QUIZ2 and overwrite it? i restricted to use any lib on Python and my Expect output like this :

NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50

CodePudding user response:

See below - zero imports

TASK1 = 1
TASK2 = 2
QUIZ1 = 3
QUIZ2 = 4

MIN = 0
MAX = 1
values = [[None,None],[None,None],[None,None],[None,None]]
lines = []
with open('in.txt') as f:
  for idx,line in enumerate(f):
    if idx > 0:
      lines.append(line)
      parts = line.split(',')
      for i in [TASK1,TASK2,QUIZ1,QUIZ2]:
        if values[i-1][MIN] is None or values[i-1][MIN] > parts[i]:
          values[i-1][MIN] = parts[i].strip()    
        if values[i-1][MAX] is None or values[i-1][MAX] < parts[i]:
          values[i-1][MAX] = parts[i].strip()
with open('out.txt','w') as f: 
  f.write('NO,ID,TASK1,TASK2,QUIZ1,QUIZ2\n') 
  for idx,line in enumerate(lines,1):
    f.write(f'{idx},{line}') 
  f.write('\n')  
  f.write('MAX, ,'   ','.join(str(x[MAX]) for x in values)) 
  f.write('\n')  
  f.write('MIN, ,'   ','.join(str(x[MIN]) for x in values)) 

out.txt

NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50

CodePudding user response:

Assuming your input is already in a csv files, you could use the following code:

# This reads in your file
def load_csv(p):
    with open(p, "r") as f:
        return [l.strip().split(",") for l in f]

# This isolates the numeric columns you are interested in
def to_numeric(lines):
    return [list(map(int,line[1:])) for i, line in enumerate(lines) if i > 0]

# This applies a function to the columns results of to_numeric
def apply_function(numeric_lists, func=max):
    return [func(x) for x in zip(*numeric_lists)] 

# This applies the defined function, printing your result in the desired format
def main(p):
    lines = load_csv(p)
    numeric_lists = to_numeric(lines)
    mins = apply_function(numeric_lists, func=min)
    maxs = apply_function(numeric_lists)
    for i, l in enumerate(lines):
        print(("No," if i == 0 else str(i))   ",".join(l))
    print(f'MAX, ,{",".join(map(str,maxs))}')
    print(f'MIX, ,{",".join(map(str,mins))}')

# Simply call
main(YOUR_FILE_PATH)

CodePudding user response:

You can open file in r mode which will allow you to both read and write to the file without reopening.

Short (and kinda complicated for understanding) version of code (explanation below):

with open(r"filename.csv", "r ") as f:
    next(f)  # skip first line
    [*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))
    print("MAX", "", *map(max, columns), sep=",", file=f)
    print("MIN", "", *map(min, columns), sep=",", file=f)

Now let's explain how does it work. To get maximum/minimum of each column we have to group all values of column into a one container. To get rows we should split every line by coma and convert each value to integer. As we don't need value of first column we can skip it. All this work done by a single line of code:

[*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))

Same we can write in more explicit way:

rows = []
for line in f:
    row = []
    for column in line.split(",")[1:]:  # skip first column
        row.append(int(column))  # convert string to int
    rows.append(row)
columns = list(zip(*rows))

[*columns] = iterable is the same as columns = list(iterable).

After we got list of tuples containing values of each column we can apply max() and min() to each tuple to get maximum and minimum values. We can do it either using generator expression:

(max(column) for column in columns)
(min(column) for column in columns)

Or using map():

map(max, columns)
map(min, columns)

To write result to file we need to convert values to strings and join them back into a single string using str.join():

max_row = ["MAX", ""]  # empty string needed to get double comas
min_row = ["MIN", ""]
for column in columns:
    max_row.append(str(max(column)))
    min_row.append(str(min(column)))
f.write("\n"   ",".join(max_row))
f.write("\n"   ",".join(min_row))

Alternatively, we can use print(). We can unpack our results to print, pass "," to sep argument which will make python join all arguments using coma and pass our file object to file argument to make it write to file instead of console.

print("MAX", "", *map(max, columns), sep=",", file=f)
print("MIN", "", *map(min, columns), sep=",", file=f)

So here is the boring version of code from the top of answer:

with open(r"filename.csv", "r ") as f:
    next(f)  # skip first line
    rows = []
    for line in f:
        row = []
        for column in line.split(",")[1:]:  # skip first column
            row.append(int(column))  # convert string to int
        rows.append(row)
    max_row = ["MAX", ""]  # empty string needed to get double comas
    min_row = ["MIN", ""]
    for column in zip(*rows):
        max_row.append(str(max(column)))
        min_row.append(str(min(column)))
    f.write("\n"   ",".join(max_row))
    f.write("\n"   ",".join(min_row))

Upd. If you want to write result to separate file and add index column, you need to open also a destination file for writing, rewrite each line from source file to destination (adding it's index) and append minimums and maximums to the end of file.

Modified short version of code (it gets even more complicated):

with open(r"input.csv") as inp_f, \
        open(r"output.csv", "w") as out_f:
    out_f.write("NO,"   next(inp_f)) 
    [*columns] = zip(*(out_f.write(f"{index},{line.rstrip()}\n") and
                       (int(i) for i in line.split(",")[1:])
                       for index, line in enumerate(inp_f)))
    print("MAX", "", *map(max, columns), sep=",", file=out_f)
    print("MIN", "", *map(min, columns), sep=",", file=out_f)

Modified boring version of code:

with open(r"input.csv") as inp_f, \
        open(r"output.csv", "w") as out_f:
    out_f.write("NO,"   next(inp_f))
    rows = []
    for index, line in enumerate(inp_f, 1):
        out_f.write(str(index)   ","   line)
        row = []
        for column in line.split(",")[1:]:  # skip first column
            row.append(int(column))  # convert string to int
        rows.append(row)
    max_row = ["MAX", ""]  # empty string needed to get double comas
    min_row = ["MIN", ""]
    for column in zip(*rows):
        max_row.append(str(max(column)))
        min_row.append(str(min(column)))
    out_f.write("\n"   ",".join(max_row))
    out_f.write("\n"   ",".join(min_row))

Upd. I have made some tests (code). Tests results (lower is better):

olvin1: 0.687356842
olvin2: 0.5448804249999999
nikeros: 0.7540002289999999
balderman: 0.6384034139999999

So "boring" method from this answer demonstrates best performance.

  • Related