Home > Back-end >  Group line categories based on similar text from .dat file into a list
Group line categories based on similar text from .dat file into a list

Time:12-22

I am completely new to using CSV. I have seen answers based on line by line, but I cold not find reading groups of lines. I have a csv file containing for example the following data format below, where the first line contains starting, month and day. From the second line I have data only on the last two columns say A and B.

starting  6      3
          34.75  15
          34.75  15.25
          32.5   14.2
starting  7      27
          12.75  14.75
          13     15   
starting  7     28
          29    33
          29    33.25

What I want to do is to retrieve the data from below each starting and write them separately as a list of 3 arrays with 2 columns each. The purpose for this is to be able to plot each array of starting independently.

Here is the code I managed to write after several search, please kindly help to correct me where I am missing it.

import numpy as np

#input file
f=open('./latlon.dat','r') 

lines = f.readlines()    # Read file and close
f.close()

i = 0
tr = []

while (i < (len(lines)-1) ):
    line = lines[i]
    i = i 1
    linesplit = line.strip().split('\t')
    if linesplit[0] == 'start' :
        latlog = int(linesplit[1])
        latlogarray = np.genfromtxt(lines[i:(i latlog)])
        for k in range (i-1,i latlog):
            i = i latlogarray
            tr.append(k)
print(tr)

Thank you in anticipation.

CodePudding user response:

It might be easier to work with pandas in this case:

import pandas as pd

df = pd.read_csv('./latlon.dat', sep = ' ') # sep is whatever delimiter you want
df.plot() # will plot the three variables together

CodePudding user response:

You can accomplish this using Python's CSV module.

Reading your problem, you have a group which is delimited by a line with starting. The data for the group starts after that line. Finally, you want all groups.

If that's true, then you want to read line-by-line, and when you read a line with starting in the first column, save the last group (if it exists) and start a new group:

#!/usr/bin/env python3
import csv
import pprint

groups = []
group = None
with open('latlon.dat', newline='') as f:
    reader = csv.reader(f, delimiter='|')

    for row in reader:
        # Normalize all cells at once
        row = [cell.strip() for cell in row]

        # Deal with "blank" lines
        if len(row) == 0 or len(row) == 1:
            continue

        # Starting new group...
        if row[0] == 'starting':
            # Save last group, if it exists
            if group:
                groups.append(group)
            # Reset group
            group = []
            # Don't do anything else with this row
            continue

        # A "data row"
        group.append([float(row[1]), float(row[2])])


groups.append(group)
pprint.pprint(groups)

When I run that against your sample, I get:

[
 [[34.75, 15.0], [34.75, 15.25], [32.5, 14.2]],
 [[12.75, 14.75], [13.0, 15.0]],
 [[29.0, 33.0], [29.0, 33.25]]
]
  • Related