I have a very simple issue...
I am working with a CSV file, for some reason when I open it, once of the columns comes out as a float, whci is not in the original file. It also gives me 500 NaN rows, which is also inconsistent with the csv file. I drop the NAs, convert to int and it al seems good, until I reassign it back and it goes back to float. First time for me. (well, I have a lot of first times, but...)
Thanks in advance!
Cheers!
df['ID'] #returns a float.
Returns -
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
df['ID'].dropna().astype(int)
Returns -
0 1
1 2
2 3
3 4
4 5
df['ID'] = df['ID'].dropna().astype(int)
Returns -
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
CodePudding user response:
You are assigning a serie to a column so NaN
values are being kept and Nan in Pandas is of a type float, try this:
df.dropna(subset=["ID"],inplace= True)
df["ID"] = df["ID"].astype(int)
print(df)
ID
0 1
1 2
2 3
CodePudding user response:
Try a reassining to temp df and replace in orginal df.
df_temp = df['id'].dropna().astype(int)
df['id'] = df_temp
print(df)
CodePudding user response:
I think that dropna() returns a DataFrame, not a column, so df=df['ID'].dropna().astype(int).reset_index(drop=True)
should solve the problem