iterating through a dataframe alternative of for loop-CodePudding

i have a very large dataframe, i did a for loop but it is taking forever, and I am wondering if there is any alternative?

index	ids	year
0	1890	2001
1	2678	NaN
2	4780	NaN
3	9844	1999

the idea is to get an array of ids of people who have NaN values in the 'year' column, so what I did, was I turned NaN into 0, and wrote this for loop.

df_nan = []
for i in range(0, len(df.index)):
    for j in range(0, len(df.columns)):
        if ((int(df.values[i,j])) == 0):
            df_nan.append(df.values[i,0])

the for loop works, coz I tried it on a smaller dataframe, but I cant use it on the main one because it takes so long.

CodePudding user response：

You can use filtering.

df = pd.DataFrame({'ids': [1890, 2678, 4780, 9844], 'year': [2001, pd.np.nan, pd.np.nan, 1999]})
nan_rows = df[df['year'].isnull()]
ids = nan_rows['ids'].values
print(ids) # outputs: [2678 4780]