I have a data frame - the table below
row_id | name | age | state | |
---|---|---|---|---|
1 | Sta | [email protected] | 29 | AK |
2 | NaN | [email protected] | NaN | NB |
3 | Elle | NaN | 32 | CA |
4 | Elle | NaN | 44 | NaN |
I want to have the row id and column name wherever my df is NaN/Empty in the below format
row_id | col_id | value |
---|---|---|
2 | Name | NaN |
2 | age | NaN |
3 | name | NaN |
4 | name | NaN |
4 | state | NaN |
import numpy as np
tmp = pd.DataFrame(columns= ['row_id', 'col_id','Value','Gap'], dtype=object) #Creating new DF
for name, values in proc_df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if np.isnan(v):
tmp.loc[len(tmp)] = [proc_df.at[i,'row_id'], name, v, len([v])]
tmp
but I am getting the below error
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
CodePudding user response:
Here, my basic idea is to read DF by columns and if its founds None value
create new list
and append
into new df
the list such as [2, A, NaN, 1]
.
Note : Dont know your logic for GAP column, so am just counting the None value in that specific cell.
Code:
import pandas as pd
tmp = pd.DataFrame(columns= ['row_id', 'Col_id','Value']) #Creating new DF
for name, values in df.iteritems(): #For loop by items
for i,v in enumerate(values.tolist()): #For loop on values
if pd.isna(v):
tmp.loc[len(tmp)] = [df.at[i,'row_id'], name, v]
tmp
Output:
row_id Col_id Value
0 2 name None
1 3 email None
2 4 email None
3 2 age None
4 4 sate None
CodePudding user response:
you can use pd.isnull and np.argwhere
I referred to this answer to get indices of null values.
This method:
- requires less manual iteration on your part
- does not require additional dependencies (pandas is a numpy wrapper)
import pandas as pd
import numpy as np
output_d = { # save outputs here
"row_id":[],
"col_id":[],
"value":[],
}
# use .values - np.argwhere only works on numpy arrays
null_array = pd.isnull(data_df.values)
# get an iterable of [row,col] arrays
row_col_ls = np.argwhere(null_array)
# iterate through indices and save values
for row_col_tuple in row_col_ls:
# row_col_tuple[0] - the row
output_d["row_id"].append(temp_df["row_id"].iloc[row_col_tuple[0]])
# row_col_tuple[1] - the col
output_d["col_id"].append(temp_df.columns[row_col_tuple[1]])
# save the nan value from original df
output_d["value"].append(temp_df.iloc[row_col_tuple[0],row_col_tuple[1]])
output_df = pd.DataFrame(output_d)
# output_df
# row_id col_id value
# 0 2 name NaN
# 1 2 age NaN
# 2 3 email NaN
# 3 4 email NaN
# 4 4 state NaN