I am completely new to using CSV. I have seen answers based on line by line, but I cold not find reading groups of lines. I have a csv file containing for example the following data format below, where the first line contains starting, month and day. From the second line I have data only on the last two columns say A and B.
starting 6 3
34.75 15
34.75 15.25
32.5 14.2
starting 7 27
12.75 14.75
13 15
starting 7 28
29 33
29 33.25
What I want to do is to retrieve the data from below each starting and write them separately as a list of 3 arrays with 2 columns each. The purpose for this is to be able to plot each array of starting independently.
Here is the code I managed to write after several search, please kindly help to correct me where I am missing it.
import numpy as np
#input file
f=open('./latlon.dat','r')
lines = f.readlines() # Read file and close
f.close()
i = 0
tr = []
while (i < (len(lines)-1) ):
line = lines[i]
i = i 1
linesplit = line.strip().split('\t')
if linesplit[0] == 'start' :
latlog = int(linesplit[1])
latlogarray = np.genfromtxt(lines[i:(i latlog)])
for k in range (i-1,i latlog):
i = i latlogarray
tr.append(k)
print(tr)
Thank you in anticipation.
CodePudding user response:
It might be easier to work with pandas in this case:
import pandas as pd
df = pd.read_csv('./latlon.dat', sep = ' ') # sep is whatever delimiter you want
df.plot() # will plot the three variables together
CodePudding user response:
You can accomplish this using Python's CSV module.
Reading your problem, you have a group which is delimited by a line with starting
. The data for the group starts after that line. Finally, you want all groups.
If that's true, then you want to read line-by-line, and when you read a line with starting
in the first column, save the last group (if it exists) and start a new group:
#!/usr/bin/env python3
import csv
import pprint
groups = []
group = None
with open('latlon.dat', newline='') as f:
reader = csv.reader(f, delimiter='|')
for row in reader:
# Normalize all cells at once
row = [cell.strip() for cell in row]
# Deal with "blank" lines
if len(row) == 0 or len(row) == 1:
continue
# Starting new group...
if row[0] == 'starting':
# Save last group, if it exists
if group:
groups.append(group)
# Reset group
group = []
# Don't do anything else with this row
continue
# A "data row"
group.append([float(row[1]), float(row[2])])
groups.append(group)
pprint.pprint(groups)
When I run that against your sample, I get:
[
[[34.75, 15.0], [34.75, 15.25], [32.5, 14.2]],
[[12.75, 14.75], [13.0, 15.0]],
[[29.0, 33.0], [29.0, 33.25]]
]