So the code down below looks for lines in the csv file that contain the keywords $$n[]:
or $$n[<characters>]:
if the code has these it will be a note
and this note will be matched with text
(A line without the $$n:
or $$n[<characters>]: keyword
) above it. I want to create a pandas data frame
that will show the matched texts
and notes. If the text
does not have a note
attached to it then it will just be a None
value. Look at the expected output for the wanted results.
csv file:
yes hello there
move on to the next command if the previous command was successful.
$$n:describes the '&&' character in the RUN command.
k
$$n[t(a1), mfc(a1,expand,rr)]: description
code:
import pandas as pd
import numpy as np
import logging
def _ReadCsv(filename, READ_MODE):
"""
Read CSV file from remote path.
Args:
filename(str): filename to read.
Returns:
The contents of CSV file.
Raises:
ValueError: Unable to read file
"""
data = None
try:
with open(filename, READ_MODE) as fobj:
data = fobj.readlines()
except IOError:
logging.exception('')
if not data:
raise ValueError('No data available')
return data
#Return all the lines from txt file
file1 = _ReadCsv('hello.txt', 'r')
#Keep a record of the Texts and Notes of the txt file
text_note = {'Text': np.array([]), 'Note':np.array([])}
for count,line in enumerate(file1):
"""
Get rid of empty lines
Match each text with a note
If note is empty then give it a value of null
"""
if line.strip():
if line.startswith('$$n:'):
text_note['Note'] = np.append(text_note['Note'], [line.strip()])
else:
text_note['Text'] = np.append(text_note['Text'], [line.strip()])
Expected Output:
---- --------------------------------------------------- ------------------------------------------------
| | Text | Note |
|---- --------------------------------------------------- ------------------------------------------------|
| 0 | yes hello there | None |
| 1 | move on to the next command if the previous co... | describes the && character in the RUN command. |
| 2 | k | description |
---- --------------------------------------------------- ------------------------------------------------
CodePudding user response:
This should work:
import re
import pandas as pd
with open('test.csv') as f:
lines = [s.strip() for s in f.read().split('\n') if s]
texts = []
notes = []
for i, line in enumerate(lines):
if re.search(r'\$\$.*\:', line):
notes.append(re.sub(r'\$\$.*\:', '', line).strip())
else:
texts.append(line)
if len(texts) - len(notes) > 1:
notes.append(None)
if len(notes) != len(texts):
notes.append(None)
df = pd.DataFrame({
'Text': texts,
'Note': notes
})
Output:
Text Note
0 yes hello there None
1 move on to the next command if the previous co... describes the '&&' character in the RUN command.
2 k description