Trying to split text and selecting data based on 2nd columns:
Attribute1|Number|7
Attribute2|Text||"sample text"
Attribute3|Columns|4||"data1"|"data2"|"data3"|"data4"
If it says Number then, it should pick data in the third field. If it says Text then, it should pick data in the fourth field. If it says Columns then it has to make a number of columns based on the third field.
Final data should look be in a data frame like this:
Col_1 Col_2
Attribute1_value 7
Attribute2_value "sample text"
Attribute3_value_0 data1
Attribute3_value_1 data2
Attribute3_value_2 data3
Attribute3_value_3 data4
CodePudding user response:
You can store your splitted lines in a dictionary and make a Series out of it:
output_dict = {}
with open("file.txt", "r") as f:
while True:
line = f.readline()
if not line:
break
fields = line.strip("\n").split('|')
if fields[1] == "Number":
output_dict[fields[0]] = fields[2]
elif fields[1] == "Text":
output_dict[fields[0]] = fields[3]
elif fields[1] == "Columns":
output_dict[fields[0]] = fields[4:4 int(fields[2])]
#print(output_dict)
series = pd.Series(output_dict)
print(series.explode())
Output:
Attribute1 7
Attribute2 "sample text"
Attribute3 "data1"
Attribute3 "data2"
Attribute3 "data3"
Attribute3 "data4"