hi I have the following txt file
December
line: 285 - event ID: 67511
line: 296 - event ID: 67512
November
line: 305 - event ID: 67515
line: 300 - event ID: 67517
I want to transform it into the following data frame
df1 = pd.DataFrame(
{
"index": ["December", "December", "November", "November"],
"index1": ["285", "296", "305", "300"],
"eventid": ["67511", "67512", "64515", "64517"]})
index index1 eventid
0 December 285 67511
1 December 296 67512
2 November 305 64515
3 November 300 64517
any ideas?
CodePudding user response:
I have used pattern matching to achieve what you need:
import re
import pandas as pd
res = []
month_pattern = re.compile("^\w $")
line_pattern = re.compile("\d ")
current_month = ""
with open("FILE_PATH_TO_YOUR_DATA", "r") as f:
for line in f:
m = month_pattern.findall(line)
if len(m) > 0:
current_month = m[0]
m = line_pattern.findall(line)
if len(m) > 0:
res.append([current_month] m)
df = pd.DataFrame(res, columns = ["index", "index1", "eventid"])
print(df)
OUTPUT
index index1 eventid
0 December 285 67511
1 December 296 67512
2 November 305 67515
3 November 300 67517