When I want to see a df finding the null values in a dataset here is what I get.
df.isnull().sum()
BAD 0
LOAN 0
MORTDUE 518
VALUE 112
REASON 252
JOB 279
YOJ 515
DEROG 708
DELINQ 580
CLAGE 308
NINQ 510
CLNO 222
DEBTINC 1267
dtype: int64
Next when I create a dataframe using this df, I get it as below.
df2 = pd.DataFrame(df.isnull().sum())
df2.set_index(0)
df2.index.name = None
0
BAD 0
LOAN 0
MORTDUE 518
VALUE 112
REASON 252
JOB 279
YOJ 515
DEROG 708
DELINQ 580
CLAGE 308
NINQ 510
CLNO 222
DEBTINC 1267
Why is that extra row coming in the output, and how can I remove it?. I saw a normal test, with df and i am able to use it (using set_index(0), and df.index.name = None and was able to remove the extra row. But that does not work on the created dataframe df2.
CodePudding user response:
As you may already know, that extra zero appearing as an "extra row" in your output is actually the header for the column name(s). When you create the DataFrame, try passing a column name if you want something more descriptive than the default "0" for column name:
df2 = pd.DataFrame(df.isnull().sum(), columns=["Null_Counts"])
Same as the difference you would get from these two variants:
print(pd.DataFrame([0,1,2,3,4,5]))
0
0 0
1 1
2 2
3 3
4 4
5 5
vs
print(pd.DataFrame([0,1,2,3,4,5], columns=["My_Column"]))
My_Column
0 0
1 1
2 2
3 3
4 4
5 5
And, if you just don't want the header row to show up in your output, which seems to be the intent of your question, then you could do something like this to just use the index values and the count values to create whatever output format you want:
df1 = pd.DataFrame([0,1,2,3,4,5], columns=["My_Column"])
for tpl in zip(df1.index.values, df1["My_Column"].values):
print("{}\t{}".format(tpl[0], tpl[1]))
Output:
0 0
1 1
2 2
3 3
4 4
5 5
And you can also use the DataFrame function to_csv() and pass header=False if you just want to print or save the CSV output somewhere without the header row:
print(df1.to_csv(header=False))
0,0
1,1
2,2
3,3
4,4
5,5
And you can also pass sep="\t" to the to_csv function call if you prefer tab- instead of comma-delimited output.