Home > OS >  Skip rows, but take information when reading csv in python
Skip rows, but take information when reading csv in python

Time:06-03

My csv files are in the following format:

NAME: John
AGE: 19
HEIGHT: 178
COURSE; SEMESTER; GRADE; RESULT
MATH;1;10;PASS
BIOLOGY;2;5;FAIL

So, the headers are after some rows, and I could skip it when reading, without problems, but I would like the first rows to become columns so I can merge all the files in a single dataframe. The first one should come in this way:

NAME; AGE; HEIGHT; COURSE; SEMESTER; GRADE; RESULT
John; 19; 178;MATH;1;10;PASS
John; 19; 178; BIOLOGY;2;5;FAIL

CodePudding user response:

Well, it was a good training for me, so thank you :)

d1 = '''
NAME: John
AGE: 19
HEIGHT: 178
COURSE; SEMESTER; GRADE; RESULT
MATH;1;10;PASS
BIOLOGY;2;5;FAIL
'''
df1 = pd.read_csv(StringIO(d1), sep=':', nrows=3, header=None)
df2 = pd.read_csv(StringIO(d1), sep=';', skiprows=4)

df1 = df1.T #transpose
df1.columns = df1.iloc[0] #make index[0] new header
df1 =  df1.drop([0]) #remove old index[0] which is now a duplicate

df3 = pd.concat([df1, df2], axis=1).fillna(method="bfill") #concat and fill NaN
df3
index NAME AGE HEIGHT COURSE SEMESTER GRADE RESULT
0 John 19 178 MATH 1 10 PASS
1 John 19 178 BIOLOGY 2 5 FAIL

Note: maybe you should try to better scrap/clean/fill your csv? ;)

  • Related