Home > Mobile >  Row and Column number of empty value present in dataframe in Python
Row and Column number of empty value present in dataframe in Python

Time:10-19

I have a data frame - the table below

row_id name email age state
1 Sta [email protected] 29 AK
2 NaN [email protected] NaN NB
3 Elle NaN 32 CA
4 Elle NaN 44 NaN

I want to have the row id and column name wherever my df is NaN/Empty in the below format

row_id col_id value
2 Name NaN
2 age NaN
3 name NaN
4 name NaN
4 state NaN
import numpy as np
tmp = pd.DataFrame(columns= ['row_id', 'col_id','Value','Gap'], dtype=object) #Creating new DF

for name, values in proc_df.iteritems():           #For loop by items
    for i,v in enumerate(values.tolist()):    #For loop on values
        if np.isnan(v):
            tmp.loc[len(tmp)] = [proc_df.at[i,'row_id'], name, v, len([v])]

tmp

but I am getting the below error

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

CodePudding user response:

Here, my basic idea is to read DF by columns and if its founds None value create new list and append into new df the list such as [2, A, NaN, 1].

Note : Dont know your logic for GAP column, so am just counting the None value in that specific cell.

Code:

import pandas as pd
tmp = pd.DataFrame(columns= ['row_id', 'Col_id','Value']) #Creating new DF

for name, values in df.iteritems():           #For loop by items
    for i,v in enumerate(values.tolist()):    #For loop on values
        if pd.isna(v):
            tmp.loc[len(tmp)] = [df.at[i,'row_id'], name, v]

tmp

Output:

    row_id  Col_id  Value
0   2       name    None
1   3       email   None
2   4       email   None
3   2       age     None
4   4       sate    None

CodePudding user response:

you can use pd.isnull and np.argwhere

I referred to this answer to get indices of null values.

This method:

  • requires less manual iteration on your part
  • does not require additional dependencies (pandas is a numpy wrapper)
import pandas as pd
import numpy as np

output_d = { # save outputs here
    "row_id":[],
    "col_id":[],
    "value":[],
}

# use .values - np.argwhere only works on numpy arrays
null_array = pd.isnull(data_df.values) 

# get an iterable of [row,col] arrays
row_col_ls = np.argwhere(null_array)

# iterate through indices and save values
for row_col_tuple in row_col_ls:

    # row_col_tuple[0] - the row
    output_d["row_id"].append(temp_df["row_id"].iloc[row_col_tuple[0]])

    # row_col_tuple[1] - the col
    output_d["col_id"].append(temp_df.columns[row_col_tuple[1]])
    
    # save the nan value from original df
    output_d["value"].append(temp_df.iloc[row_col_tuple[0],row_col_tuple[1]]) 

output_df = pd.DataFrame(output_d)

# output_df
#    row_id col_id  value
# 0       2   name    NaN
# 1       2    age    NaN
# 2       3  email    NaN
# 3       4  email    NaN
# 4       4  state    NaN
  • Related