Home > Net >  Extra row in dataframe creation and how to remove it
Extra row in dataframe creation and how to remove it

Time:01-02

When I want to see a df finding the null values in a dataset here is what I get.

df.isnull().sum()

    BAD           0
    LOAN          0
    MORTDUE     518
    VALUE       112
    REASON      252
    JOB         279
    YOJ         515
    DEROG       708
    DELINQ      580
    CLAGE       308
    NINQ        510
    CLNO        222
    DEBTINC    1267
    dtype: int64

Next when I create a dataframe using this df, I get it as below.

df2 = pd.DataFrame(df.isnull().sum())

df2.set_index(0)

df2.index.name = None

                0
    BAD         0
    LOAN        0
    MORTDUE   518
    VALUE     112
    REASON    252
    JOB       279
    YOJ       515
    DEROG     708
    DELINQ    580
    CLAGE     308
    NINQ      510
    CLNO      222
    DEBTINC  1267

Why is that extra row coming in the output, and how can I remove it?. I saw a normal test, with df and i am able to use it (using set_index(0), and df.index.name = None and was able to remove the extra row. But that does not work on the created dataframe df2.

CodePudding user response:

As you may already know, that extra zero appearing as an "extra row" in your output is actually the header for the column name(s). When you create the DataFrame, try passing a column name if you want something more descriptive than the default "0" for column name:

  df2 = pd.DataFrame(df.isnull().sum(), columns=["Null_Counts"])

Same as the difference you would get from these two variants:

print(pd.DataFrame([0,1,2,3,4,5]))
   0
0  0
1  1
2  2
3  3
4  4
5  5

vs

print(pd.DataFrame([0,1,2,3,4,5], columns=["My_Column"]))
   My_Column
0          0
1          1
2          2
3          3
4          4
5          5

And, if you just don't want the header row to show up in your output, which seems to be the intent of your question, then you could do something like this to just use the index values and the count values to create whatever output format you want:

df1 = pd.DataFrame([0,1,2,3,4,5], columns=["My_Column"])
for tpl in zip(df1.index.values, df1["My_Column"].values):
    print("{}\t{}".format(tpl[0], tpl[1]))

Output:

0   0
1   1
2   2
3   3
4   4
5   5

And you can also use the DataFrame function to_csv() and pass header=False if you just want to print or save the CSV output somewhere without the header row:

print(df1.to_csv(header=False))

0,0
1,1
2,2
3,3
4,4
5,5

And you can also pass sep="\t" to the to_csv function call if you prefer tab- instead of comma-delimited output.

  • Related