I have an original dataframe called df
as seen below:
0 1 2 3 4 5 6 7
0 Table 3.1a NaN NaN NaN NaN NaN NaN NaN
1 BLANK NaN NaN NaN NaN NaN NaN NaN
2 BLANK FY2015 FY2016 FY2017 FY2018 FY2019 FY2020 FY2021
3 BLANK NaN NaN NaN NaN NaN (Revised) (Budgeted)
4 Operating Revenue 64823 68964 75816 73738 74274 64608 76636
I am attempting to set the index of the dataframe to the first column and set index 2 row as the header, as well as remove empty rows with BLANK and NaN.
However, when I run the following code:
df = df.set_index(df.columns[0])
new_header = df.iloc[2]
df = df.drop([df.index[0], df.index[1], df.index[2], df.index[3]])
df.columns = new_header
the resulting dataframe is:
BLANK FY2015 FY2016 FY2017 FY2018 FY2019 FY2020 FY2021
0
Operating Revenue 64823 68964 75816 73738 74274 64608 76636
Tax Revenue 55647 58699 66363 66203 67645 58564 69964
Fees and Charges 8674 9760 9075 7106 6275 5595 6185
Others 502 506 378 430 354 449 487
Total Expenditure 67447 71045 73556 77824 75337 94056 102338
What can I do to remove the BLANK from the 1st item in the header row ("BLANK") and remove the 0 below the "BLANK" cell? The dataframe I want to achieve at the end is:
FY2015 FY2016 FY2017 FY2018 FY2019 FY2020 FY2021
Operating Revenue 64823 68964 75816 73738 74274 64608 76636
Tax Revenue 55647 58699 66363 66203 67645 58564 69964
Fees and Charges 8674 9760 9075 7106 6275 5595 6185
Others 502 506 378 430 354 449 487
Total Expenditure 67447 71045 73556 77824 75337 94056 102338
CodePudding user response:
Starting from your original DataFrame, you can try with:
df = df.drop(df.index[[0,1,3]])
df = df.rename(columns=df.iloc[0])
df = df.drop(df.index[0])
df = df.set_index(df.columns[0])
df.index.name = None
Also, pay attention that in Tax Revenue
you have values probably in the wrong positions, since the first number is in the index columns.