My csv files are in the following format:
NAME: John
AGE: 19
HEIGHT: 178
COURSE; SEMESTER; GRADE; RESULT
MATH;1;10;PASS
BIOLOGY;2;5;FAIL
So, the headers are after some rows, and I could skip it when reading, without problems, but I would like the first rows to become columns so I can merge all the files in a single dataframe. The first one should come in this way:
NAME; AGE; HEIGHT; COURSE; SEMESTER; GRADE; RESULT
John; 19; 178;MATH;1;10;PASS
John; 19; 178; BIOLOGY;2;5;FAIL
CodePudding user response:
Well, it was a good training for me, so thank you :)
d1 = '''
NAME: John
AGE: 19
HEIGHT: 178
COURSE; SEMESTER; GRADE; RESULT
MATH;1;10;PASS
BIOLOGY;2;5;FAIL
'''
df1 = pd.read_csv(StringIO(d1), sep=':', nrows=3, header=None)
df2 = pd.read_csv(StringIO(d1), sep=';', skiprows=4)
df1 = df1.T #transpose
df1.columns = df1.iloc[0] #make index[0] new header
df1 = df1.drop([0]) #remove old index[0] which is now a duplicate
df3 = pd.concat([df1, df2], axis=1).fillna(method="bfill") #concat and fill NaN
df3
index | NAME | AGE | HEIGHT | COURSE | SEMESTER | GRADE | RESULT |
---|---|---|---|---|---|---|---|
0 | John | 19 | 178 | MATH | 1 | 10 | PASS |
1 | John | 19 | 178 | BIOLOGY | 2 | 5 | FAIL |
Note: maybe you should try to better scrap/clean/fill your csv? ;)