I have csv example like this :
ID,TASK1,TASK2,QUIZ1,QUIZ2
11061,50,75,50,78
11062,70,80,60,50
11063,60,75,77,79
11064,52,85,50,80
11065,70,85,50,80
How do i get a new row of NO
and get a max and min score on TASK1
,TASK2
,QUIZ1
,QUIZ2
and overwrite it? i restricted to use any lib on Python and my Expect output like this :
NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50
CodePudding user response:
See below - zero imports
TASK1 = 1
TASK2 = 2
QUIZ1 = 3
QUIZ2 = 4
MIN = 0
MAX = 1
values = [[None,None],[None,None],[None,None],[None,None]]
lines = []
with open('in.txt') as f:
for idx,line in enumerate(f):
if idx > 0:
lines.append(line)
parts = line.split(',')
for i in [TASK1,TASK2,QUIZ1,QUIZ2]:
if values[i-1][MIN] is None or values[i-1][MIN] > parts[i]:
values[i-1][MIN] = parts[i].strip()
if values[i-1][MAX] is None or values[i-1][MAX] < parts[i]:
values[i-1][MAX] = parts[i].strip()
with open('out.txt','w') as f:
f.write('NO,ID,TASK1,TASK2,QUIZ1,QUIZ2\n')
for idx,line in enumerate(lines,1):
f.write(f'{idx},{line}')
f.write('\n')
f.write('MAX, ,' ','.join(str(x[MAX]) for x in values))
f.write('\n')
f.write('MIN, ,' ','.join(str(x[MIN]) for x in values))
out.txt
NO,ID,TASK1,TASK2,QUIZ1,QUIZ2
1,11061,50,75,50,78
2,11062,70,80,60,50
3,11063,60,75,77,79
4,11064,52,85,50,80
5,11065,70,85,50,80
MAX, ,70,85,77,80
MIN, ,50,75,50,50
CodePudding user response:
Assuming your input is already in a csv files, you could use the following code:
# This reads in your file
def load_csv(p):
with open(p, "r") as f:
return [l.strip().split(",") for l in f]
# This isolates the numeric columns you are interested in
def to_numeric(lines):
return [list(map(int,line[1:])) for i, line in enumerate(lines) if i > 0]
# This applies a function to the columns results of to_numeric
def apply_function(numeric_lists, func=max):
return [func(x) for x in zip(*numeric_lists)]
# This applies the defined function, printing your result in the desired format
def main(p):
lines = load_csv(p)
numeric_lists = to_numeric(lines)
mins = apply_function(numeric_lists, func=min)
maxs = apply_function(numeric_lists)
for i, l in enumerate(lines):
print(("No," if i == 0 else str(i)) ",".join(l))
print(f'MAX, ,{",".join(map(str,maxs))}')
print(f'MIX, ,{",".join(map(str,mins))}')
# Simply call
main(YOUR_FILE_PATH)
CodePudding user response:
You can open file in r
mode which will allow you to both read and write to the file without reopening.
Short (and kinda complicated for understanding) version of code (explanation below):
with open(r"filename.csv", "r ") as f:
next(f) # skip first line
[*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))
print("MAX", "", *map(max, columns), sep=",", file=f)
print("MIN", "", *map(min, columns), sep=",", file=f)
Now let's explain how does it work. To get maximum/minimum of each column we have to group all values of column into a one container. To get rows we should split every line by coma and convert each value to integer. As we don't need value of first column we can skip it. All this work done by a single line of code:
[*columns] = zip(*((int(i) for i in line.split(",")[1:]) for line in f))
Same we can write in more explicit way:
rows = []
for line in f:
row = []
for column in line.split(",")[1:]: # skip first column
row.append(int(column)) # convert string to int
rows.append(row)
columns = list(zip(*rows))
[*columns] = iterable
is the same as columns = list(iterable)
.
After we got list of tuples containing values of each column we can apply max()
and min()
to each tuple to get maximum and minimum values. We can do it either using generator expression:
(max(column) for column in columns)
(min(column) for column in columns)
Or using map()
:
map(max, columns)
map(min, columns)
To write result to file we need to convert values to strings and join them back into a single string using str.join()
:
max_row = ["MAX", ""] # empty string needed to get double comas
min_row = ["MIN", ""]
for column in columns:
max_row.append(str(max(column)))
min_row.append(str(min(column)))
f.write("\n" ",".join(max_row))
f.write("\n" ",".join(min_row))
Alternatively, we can use print()
. We can unpack our results to print, pass ","
to sep
argument which will make python join all arguments using coma and pass our file object to file
argument to make it write to file instead of console.
print("MAX", "", *map(max, columns), sep=",", file=f)
print("MIN", "", *map(min, columns), sep=",", file=f)
So here is the boring version of code from the top of answer:
with open(r"filename.csv", "r ") as f:
next(f) # skip first line
rows = []
for line in f:
row = []
for column in line.split(",")[1:]: # skip first column
row.append(int(column)) # convert string to int
rows.append(row)
max_row = ["MAX", ""] # empty string needed to get double comas
min_row = ["MIN", ""]
for column in zip(*rows):
max_row.append(str(max(column)))
min_row.append(str(min(column)))
f.write("\n" ",".join(max_row))
f.write("\n" ",".join(min_row))
Upd. If you want to write result to separate file and add index column, you need to open also a destination file for writing, rewrite each line from source file to destination (adding it's index) and append minimums and maximums to the end of file.
Modified short version of code (it gets even more complicated):
with open(r"input.csv") as inp_f, \
open(r"output.csv", "w") as out_f:
out_f.write("NO," next(inp_f))
[*columns] = zip(*(out_f.write(f"{index},{line.rstrip()}\n") and
(int(i) for i in line.split(",")[1:])
for index, line in enumerate(inp_f)))
print("MAX", "", *map(max, columns), sep=",", file=out_f)
print("MIN", "", *map(min, columns), sep=",", file=out_f)
Modified boring version of code:
with open(r"input.csv") as inp_f, \
open(r"output.csv", "w") as out_f:
out_f.write("NO," next(inp_f))
rows = []
for index, line in enumerate(inp_f, 1):
out_f.write(str(index) "," line)
row = []
for column in line.split(",")[1:]: # skip first column
row.append(int(column)) # convert string to int
rows.append(row)
max_row = ["MAX", ""] # empty string needed to get double comas
min_row = ["MIN", ""]
for column in zip(*rows):
max_row.append(str(max(column)))
min_row.append(str(min(column)))
out_f.write("\n" ",".join(max_row))
out_f.write("\n" ",".join(min_row))
Upd. I have made some tests (code). Tests results (lower is better):
olvin1: 0.687356842
olvin2: 0.5448804249999999
nikeros: 0.7540002289999999
balderman: 0.6384034139999999
So "boring" method from this answer demonstrates best performance.