How to convert the below text data into a dataframe? Also is there a way to use explode function on certain columns only?say data3, data4 only ignoring first two data points data1,data2
Attribute1,data1,data2
Attribute2,data1,data2,data3,data4
Attribute3,data1,data2,data3
Attribute4,data1,data2,data3,data4,data5,data6
Output of text to dataframe should be like:
Attribute1|data1|data2
Attribute2|data1|data2|data3|data4
Attribute3|data1|data2|data3
Attribute4|data1|data2|data3|data4|data5|data6
Output of dataframe explode should be like:
Attribute2|data3
Attribute2|data4
Attribute3|data3
Attribute4|data3
Attribute4|data4
Attribute4|data5
Attribute4|data6
CodePudding user response:
df = pd.read_csv('test.txt', header=None, sep=';')
df = df[0].str.split(',', expand=True)
df.set_index(0, inplace=True)
df = df.stack().droplevel(1)
print(df)
output:
0
Attribute1 data1
Attribute1 data2
Attribute2 data1
Attribute2 data2
Attribute2 data3
Attribute2 data4
Attribute3 data1
Attribute3 data2
Attribute3 data3
Attribute4 data1
Attribute4 data2
Attribute4 data3
Attribute4 data4
Attribute4 data5
Attribute4 data6