matching values and creating a pandas dataframe-CodePudding

So the code down below looks for lines in the csv file that contain the keywords $$n[]: or $$n[<characters>]: if the code has these it will be a note and this note will be matched with text(A line without the $$n: or $$n[<characters>]: keyword) above it. I want to create a pandas data frame that will show the matched texts and notes. If the text does not have a note attached to it then it will just be a None value. Look at the expected output for the wanted results.

csv file:

yes hello there

move on to the next command if the previous command was successful.

$$n:describes the '&&' character in the RUN command.

k 

$$n[t(a1), mfc(a1,expand,rr)]: description

code:

import pandas as pd
import numpy as np 
import logging 

def _ReadCsv(filename, READ_MODE):
    """
    Read CSV file from remote path.

    Args:
      filename(str): filename to read.
    Returns:
      The contents of CSV file.
    Raises:
      ValueError: Unable to read file
    """
    data = None
    try:
        with open(filename, READ_MODE) as fobj:
            data = fobj.readlines()
            
    except IOError:
        logging.exception('')
    if not data:
        raise ValueError('No data available')
    
    return data 

#Return all the lines from txt file 
file1 = _ReadCsv('hello.txt', 'r')
#Keep a record of the Texts and Notes of the txt file 
text_note = {'Text': np.array([]), 'Note':np.array([])}

for count,line in enumerate(file1):
    """
    Get rid of empty lines
    Match each text with a note 
    If note is empty then give it a value of null 
    """
    if line.strip():
        if line.startswith('$$n:'):
            text_note['Note'] = np.append(text_note['Note'], [line.strip()])
        else:
            text_note['Text'] = np.append(text_note['Text'], [line.strip()])

Expected Output:

 ---- --------------------------------------------------- ------------------------------------------------ 
|    | Text                                              | Note                                           |
|---- --------------------------------------------------- ------------------------------------------------|
|  0 | yes hello there                                   | None                                           |
|  1 | move on to the next command if the previous co... | describes the && character in the RUN command. |
|  2 | k                                                 | description                                    |
 ---- --------------------------------------------------- ------------------------------------------------

CodePudding user response：

This should work:

import re
import pandas as pd

with open('test.csv') as f:
    lines = [s.strip() for s in f.read().split('\n') if s]

texts = []
notes = []
for i, line in enumerate(lines):
    if re.search(r'\$\$.*\:', line):
        notes.append(re.sub(r'\$\$.*\:', '', line).strip())
    else:
        texts.append(line)
        if len(texts) - len(notes) > 1:
            notes.append(None)

if len(notes) != len(texts):
    notes.append(None)

df = pd.DataFrame({
    'Text': texts,
    'Note': notes
})

Output:

                                                Text                                              Note
0                                    yes hello there                                              None
1  move on to the next command if the previous co...  describes the '&&' character in the RUN command.
2                                                  k                                       description