i have a very large dataframe, i did a for loop but it is taking forever, and I am wondering if there is any alternative?
index | ids | year |
---|---|---|
0 | 1890 | 2001 |
1 | 2678 | NaN |
2 | 4780 | NaN |
3 | 9844 | 1999 |
the idea is to get an array of ids of people who have NaN values in the 'year' column, so what I did, was I turned NaN into 0, and wrote this for loop.
df_nan = []
for i in range(0, len(df.index)):
for j in range(0, len(df.columns)):
if ((int(df.values[i,j])) == 0):
df_nan.append(df.values[i,0])
the for loop works, coz I tried it on a smaller dataframe, but I cant use it on the main one because it takes so long.
CodePudding user response:
You can use filtering.
df = pd.DataFrame({'ids': [1890, 2678, 4780, 9844], 'year': [2001, pd.np.nan, pd.np.nan, 1999]})
nan_rows = df[df['year'].isnull()]
ids = nan_rows['ids'].values
print(ids) # outputs: [2678 4780]