I have a dataframe. This dataframe contains three cells id
, horstid
, date
. The cell date
has one NaN
value. I want the below code what works with pandas, I want it with numpy.
First I want to transform my dataframe to a numpy array. After that I want is to find all rows where the date
is NaN
and print it. After that I want to remove all this rows. But how could I do this in numpy?
This is my dataframe
id horstid date
0 1 11 2008-09-24
1 2 22 NaN
2 3 33 2008-09-18
3 4 33 2008-10-24
This is my code. That works with fine, but with pandas.
d = {'id': [1, 2, 3, 4], 'horstid': [11, 22, 33, 33], 'date': ['2008-09-24', np.nan, '2008-09-18', '2008-10-24']}
df = pd.DataFrame(data=d)
df['date'].isna()
[OUT]
0 False
1 True
2 False
3 False
df.drop(df.index[df['date'].isna() == True])
[OUT]
id horstid date
0 1 11 2008-09-24
2 3 33 2008-09-18
3 4 33 2008-10-24
What I want is the above code without pandas but with numpy.
npArray = df.to_numpy()
date = npArray [:,2].astype(np.datetime64)
[OUT]
ValueError: Cannot create a NumPy datetime other than NaT with generic units
CodePudding user response:
Here's a solution based on Numpy and pure python:
df = pd.DataFrame.from_dict(dict(horstid = [11, 22, 33, 33], id=[1,2,3,4], data=['2008-09-24', np.nan, '2008-09-18', '2008-10-24']))
a = df.values
index = list(map(lambda x: type(x) != type(1.),a[:, 2]))
print(a[index,:])
[[11 1 '2008-09-24']
[33 3 '2008-09-18']
[33 4 '2008-10-24']]