Home > Software engineering >  Find row with nan value and delete it
Find row with nan value and delete it

Time:10-27

I have a dataframe. This dataframe contains three cells id, horstid, date. The cell date has one NaN value. I want the below code what works with pandas, I want it with numpy.

First I want to transform my dataframe to a numpy array. After that I want is to find all rows where the date is NaN and print it. After that I want to remove all this rows. But how could I do this in numpy?

This is my dataframe

   id  horstid        date
0   1       11  2008-09-24
1   2       22         NaN
2   3       33  2008-09-18
3   4       33  2008-10-24

This is my code. That works with fine, but with pandas.

d = {'id': [1, 2, 3, 4], 'horstid': [11, 22, 33, 33], 'date': ['2008-09-24', np.nan, '2008-09-18', '2008-10-24']}
df = pd.DataFrame(data=d)
df['date'].isna()

[OUT]
0    False
1    True
2    False
3    False

df.drop(df.index[df['date'].isna() == True])

[OUT]
   id  horstid        date
0   1       11  2008-09-24
2   3       33  2008-09-18
3   4       33  2008-10-24

What I want is the above code without pandas but with numpy.

npArray = df.to_numpy()
date = npArray [:,2].astype(np.datetime64) 
[OUT]
ValueError: Cannot create a NumPy datetime other than NaT with generic units

CodePudding user response:

Here's a solution based on Numpy and pure python:

df = pd.DataFrame.from_dict(dict(horstid = [11, 22, 33, 33], id=[1,2,3,4], data=['2008-09-24', np.nan, '2008-09-18', '2008-10-24']))

a = df.values

index = list(map(lambda x: type(x) != type(1.),a[:, 2]))

print(a[index,:])

[[11 1 '2008-09-24']
 [33 3 '2008-09-18']
 [33 4 '2008-10-24']]
  • Related