I have the following table:
State | Municipality | Date of Project | Poverty (year) |
---|---|---|---|
Aguascalientes | Aguascalientes | 2003 | |
Jalisco | Chapala | 2004 | |
Jalisco | Chapala | 2010 | |
Jalisco | Chapala | 2015 |
In it I have to fill the column poverty
according to the columns before it.
The following table is the one which I am going to use in order to fill the information of the first table:
State | Municipality | 2010 | 2015 |
---|---|---|---|
Aguascalientes | Aguascalientes | 65.8 | 78.5 |
Jalisco | Chapala | 45 .3 | 43.1 |
Durango | Durango | 12.4 | 13.4 |
Guanajuato | Leon | 15.2 | 13.2 |
What I need to do is with the information that have in 2010
and 2015
I have to fill the poverty
column according to the year of Date of Project
, State
and Municipality
. Additionally, if the year (of Date of Project
) is between 2000:2012 I have to fill it with the 2010
column. On the other hand, if the year is between 2013:2020, the 2015
column should be the one I need to fill with.
CodePudding user response:
You can first left-merge two dataframes (to align them), and then use np.select
:
result = df1.merge(df2, on=['State', 'Municipality'], how='left')
condlist = [result['Date of Project'].between(2000,2012),
result['Date of Project'].between(2013,2020)]
choicelist = [result['2010'], result['2015']]
result['Poverty (year)'] = np.select(condlist, choicelist)
result.drop(columns=['2010', '2015'], inplace=True)
Result:
State Municipality Date of Project Poverty (year)
0 Aguascalientes Aguascalientes 2003 65.8
1 Jalisco Chapala 2004 45.3
2 Jalisco Chapala 2010 45.3
3 Jalisco Chapala 2015 43.1
CodePudding user response:
import pandas as pd
df1 = pd.DataFrame({'State': ['Aguascalientes', 'Jalisco', 'Jalisco', 'Jalisco'],
'Municipality': ['Aguascalientes', 'Chapala', 'Chapala', 'Chapala'],
'Date of Project': ['2003', '2004', '2010', '2015'],
'Poverty': [None, None, None, None]})
df2 = pd.DataFrame({'State': ['Aguascalientes', 'Jalisco', 'Durango', 'Guanajuato'],
'Municipality': ['Aguascalientes', 'Chapala', 'Durango', 'Leon'],
'2010': [65.8, 45.3, 12.4, 15.2],
'2015': [78.5, 43.1, 13.4, 13.2]})
df1['Date of Project'] = pd.to_datetime(df1['Date of Project'])
df1['Poverty'] = df1.apply(lambda x: df2[(df2['State'] == x['State']) & (df2['Municipality'] == x['Municipality'])]['2010'].values[0] if x['Date of Project'].year <= 2012 else df2[(df2['State'] == x['State']) & (df2['Municipality'] == x['Municipality'])]['2015'].values[0], axis=1)
print(df1)
>>> print(df1)
State Municipality Date of Project Poverty
0 Aguascalientes Aguascalientes 2003-01-01 65.8
1 Jalisco Chapala 2004-01-01 45.3
2 Jalisco Chapala 2010-01-01 45.3
3 Jalisco Chapala 2015-01-01 43.1