Say I have two tables: Table A:
state value
0 A 100
Table B:
state 100 200
0 A 1 4
1 B 2 5
2 C 3 6
I want to create a new field for Table A called "Factor" that returns the respective value from Table B:
state value factor
0 A 100 1
How would I do this in Python/Pandas?
In Excel, I would do: INDEX('Table B'!B2:C4, MATCH('Table A'!A2, 'Table B'!A:A, 0), MATCH('Table A'!B2, 'Table B'!B1:C1, 0))
CodePudding user response:
You should try to use loc
for this. Please try the following example:
Assuming Table A (dfA) and Table B (dfB) are both pandas dataframes:
>>> A = {
'state' : ['A'],
'value' : [100],
}
>>> B = {
'state' : ['A','B','C'],
'100' : [1,2,3],
'200' : [4,5,6]
}
>>> dfA = pd.DataFrame(A)
>>> dfB = pd.DataFrame(B)
which gives you your tables as pandas dataframes
>>> dfA
state value
0 A 100
>>> dfB
state 100 200
0 A 1 4
1 B 2 5
2 C 3 6
Then we search a specific column in the dataframe for a specific value and return the value of a different column from that same row.
extracted_value = list(dfB.loc[dfB['state'] == 'A', '100'])[0]
We create a column and set that value for that column dfA['factor'] = extracted_value
and we now have that extracted value in the appropriate column
>>> dfA
state value factor
0 A 100 1
Now I'm sure you want to do this in a loop to address a particular list of values and I hope this helps you start. Additionally, using dfA['factor'] = extracted_value
is going to set every value in that column to extracted_value and you might want to have different values per row. To accomplish that you're going to want to use .loc again to add values based on the row index.
CodePudding user response:
- Pivot
dfA
from wide to long format usingmelt
. Ensure the column names get converted to a numeric type. merge
the longform data todfA
melted = dfB.melt(id_vars=['state'], var_name='value', value_name='factor')
melted['value'] = melted['value'].astype(int)
dfA = dfA.merge(melted, on=['state', 'value'])
Result:
state value factor
0 A 100 1
This maybe feels like overkill for this example, but could be helpful for larger lookups.