Home > Software design >  Remove content from the first cell when setting index
Remove content from the first cell when setting index

Time:10-24

I have an original dataframe called df as seen below:

    0                 1      2      3      4      5      6        7
0   Table 3.1a        NaN    NaN    NaN    NaN    NaN    NaN      NaN
1   BLANK             NaN    NaN    NaN    NaN    NaN    NaN      NaN
2   BLANK             FY2015 FY2016 FY2017 FY2018 FY2019 FY2020   FY2021
3   BLANK             NaN    NaN    NaN    NaN    NaN   (Revised) (Budgeted)
4   Operating Revenue 64823  68964  75816  73738  74274 64608     76636

I am attempting to set the index of the dataframe to the first column and set index 2 row as the header, as well as remove empty rows with BLANK and NaN.

However, when I run the following code:

df = df.set_index(df.columns[0])
new_header = df.iloc[2]
df = df.drop([df.index[0], df.index[1], df.index[2], df.index[3]])
df.columns = new_header

the resulting dataframe is:

BLANK               FY2015  FY2016  FY2017  FY2018  FY2019  FY2020  FY2021
0                           
Operating Revenue   64823   68964   75816   73738   74274   64608   76636
Tax Revenue 55647   58699   66363   66203   67645   58564   69964
Fees and Charges    8674    9760    9075    7106    6275    5595    6185
Others              502     506     378     430     354     449     487
Total Expenditure   67447   71045   73556   77824   75337   94056   102338

What can I do to remove the BLANK from the 1st item in the header row ("BLANK") and remove the 0 below the "BLANK" cell? The dataframe I want to achieve at the end is:

                    FY2015  FY2016  FY2017  FY2018  FY2019  FY2020  FY2021                      
Operating Revenue   64823   68964   75816   73738   74274   64608   76636
Tax Revenue 55647   58699   66363   66203   67645   58564   69964
Fees and Charges    8674    9760    9075    7106    6275    5595    6185
Others              502     506     378     430     354     449     487
Total Expenditure   67447   71045   73556   77824   75337   94056   102338

CodePudding user response:

Starting from your original DataFrame, you can try with:

df = df.drop(df.index[[0,1,3]])
df = df.rename(columns=df.iloc[0])
df = df.drop(df.index[0])
df = df.set_index(df.columns[0])
df.index.name = None

Also, pay attention that in Tax Revenue you have values probably in the wrong positions, since the first number is in the index columns.

  • Related