Home > Software engineering >  iterating through a dataframe alternative of for loop
iterating through a dataframe alternative of for loop

Time:12-30

i have a very large dataframe, i did a for loop but it is taking forever, and I am wondering if there is any alternative?

index ids year
0 1890 2001
1 2678 NaN
2 4780 NaN
3 9844 1999

the idea is to get an array of ids of people who have NaN values in the 'year' column, so what I did, was I turned NaN into 0, and wrote this for loop.

df_nan = []
for i in range(0, len(df.index)):
    for j in range(0, len(df.columns)):
        if ((int(df.values[i,j])) == 0):
            df_nan.append(df.values[i,0])

the for loop works, coz I tried it on a smaller dataframe, but I cant use it on the main one because it takes so long.

CodePudding user response:

You can use filtering.

df = pd.DataFrame({'ids': [1890, 2678, 4780, 9844], 'year': [2001, pd.np.nan, pd.np.nan, 1999]})
nan_rows = df[df['year'].isnull()]
ids = nan_rows['ids'].values
print(ids) # outputs: [2678 4780]
  • Related