Home > Enterprise >  fill na according to certain conditionals of other columns
fill na according to certain conditionals of other columns

Time:09-02

I have the following table:

State Municipality Date of Project Poverty (year)
Aguascalientes Aguascalientes 2003
Jalisco Chapala 2004
Jalisco Chapala 2010
Jalisco Chapala 2015

In it I have to fill the column poverty according to the columns before it. The following table is the one which I am going to use in order to fill the information of the first table:

State Municipality 2010 2015
Aguascalientes Aguascalientes 65.8 78.5
Jalisco Chapala 45 .3 43.1
Durango Durango 12.4 13.4
Guanajuato Leon 15.2 13.2

What I need to do is with the information that have in 2010 and 2015 I have to fill the poverty column according to the year of Date of Project, State and Municipality. Additionally, if the year (of Date of Project) is between 2000:2012 I have to fill it with the 2010 column. On the other hand, if the year is between 2013:2020, the 2015 column should be the one I need to fill with.

CodePudding user response:

You can first left-merge two dataframes (to align them), and then use np.select:

result = df1.merge(df2, on=['State', 'Municipality'], how='left')

condlist = [result['Date of Project'].between(2000,2012),
            result['Date of Project'].between(2013,2020)]
choicelist = [result['2010'], result['2015']]
result['Poverty (year)'] = np.select(condlist, choicelist)

result.drop(columns=['2010', '2015'], inplace=True)

Result:

            State    Municipality  Date of Project Poverty (year)
0  Aguascalientes  Aguascalientes             2003           65.8
1         Jalisco         Chapala             2004           45.3
2         Jalisco         Chapala             2010           45.3
3         Jalisco         Chapala             2015           43.1

CodePudding user response:

import pandas as pd

df1 = pd.DataFrame({'State': ['Aguascalientes', 'Jalisco', 'Jalisco', 'Jalisco'],
                    'Municipality': ['Aguascalientes', 'Chapala', 'Chapala', 'Chapala'],
                    'Date of Project': ['2003', '2004', '2010', '2015'],
                    'Poverty': [None, None, None, None]})

df2 = pd.DataFrame({'State': ['Aguascalientes', 'Jalisco', 'Durango', 'Guanajuato'],
                    'Municipality': ['Aguascalientes', 'Chapala', 'Durango', 'Leon'],
                    '2010': [65.8, 45.3, 12.4, 15.2],
                    '2015': [78.5, 43.1, 13.4, 13.2]})

df1['Date of Project'] = pd.to_datetime(df1['Date of Project'])

df1['Poverty'] = df1.apply(lambda x: df2[(df2['State'] == x['State']) & (df2['Municipality'] == x['Municipality'])]['2010'].values[0] if x['Date of Project'].year <= 2012 else df2[(df2['State'] == x['State']) & (df2['Municipality'] == x['Municipality'])]['2015'].values[0], axis=1)

print(df1)
>>> print(df1)
            State    Municipality Date of Project  Poverty
0  Aguascalientes  Aguascalientes      2003-01-01     65.8
1         Jalisco         Chapala      2004-01-01     45.3
2         Jalisco         Chapala      2010-01-01     45.3
3         Jalisco         Chapala      2015-01-01     43.1
  • Related