Home > Software engineering >  Creating new column based on NaNs in multi-index data frame (Python)
Creating new column based on NaNs in multi-index data frame (Python)

Time:05-05

I have the following multi index data frame and I want to create a new column that tells me wether a company still exists or not. ID and Year are part of the multi index.

 id  Year  Profit/Loss Total Sales  
 0   2008  300.        2000.        
 0   2009  400.        2000.       
 0   2010  500.        2000.       
 0   2011  NaN         NaN       
 0   2012  NaN         NaN   
 1   2008  300.        2000.       
 1   2009  300.        2000.  

I would need a function that checks if both columns (Total Sales and Profit/Loss) are NaN in a specific year and if they are return a 0 in the solvency column. If one of them or both have values, then it should return a 1.

Desired output:

 id  Year  Profit/Loss Total Sales  Solvency
 0   2008  300.        2000.        1
 0   2009  400.        2000.        1
 0   2010  500.        2000.        1
 0   2011  NaN         NaN          0
 0   2012  NaN         NaN          0
 1   2008  300.        2000.        1
 1   2009  300.        2000.        1

CodePudding user response:

You can use notna to identify the non-NaN columns, aggregate to boolean if any is True per row and convert to integer with astype:

df['Solvency'] = df[['Profit/Loss', 'Total Sales']].notna().any(1).astype(int)

output:

   id  Year  Profit/Loss  Total Sales  Solvency
0   0  2008        300.0       2000.0         1
1   0  2009        400.0       2000.0         1
2   0  2010        500.0       2000.0         1
3   0  2011          NaN          NaN         0
4   0  2012          NaN          NaN         0
5   1  2008        300.0       2000.0         1
6   1  2009        300.0       2000.0         1
  • Related