Factor Dataframe:
Code | Min | Max | Factor |
---|---|---|---|
A | 0 | 100 | 1 |
A | 101 | 300 | 2 |
A | 301 | 800 | 3 |
A | 800 | NaN | 4 |
B | 0 | 100 | 2 |
B | 101 | 300 | 4 |
B | 301 | 800 | 6 |
B | 800 | NaN | 8 |
The table above is used to return a factor based on two values, the 'Code' and a specific number. A specific value is provided, such as 5, 302, 253, 8000 and if that value falls between the min and max, the factor is returned.
For example:
- A, 355 = 3
- B, 9000 = 8
I have a separate table with several thousand lines with values which need to have the factor assigned.
Dataframe:
Item | Code | Value |
---|---|---|
001 | A | 1 |
002 | A | 322 |
003 | B | 21 |
004 | A | 342 |
005 | A | 32 |
006 | B | 7666 |
007 | B | 10000 |
008 | B | 86 |
How do you use the pandas tool set to return the desired factor and append the relevant factor as a new column?
CodePudding user response:
Use left join in DataFrame.merge
with replace missing value to np.inf
and then filter in Series.between
and possible missing values in Min
, if no match in boolean indexing
:
print (df1)
Code Val
0 A 355
1 B 9000
2 C 5
print (df2)
Code Min Max Factor
0 A 0 100.0 1
1 A 101 300.0 2
2 A 301 800.0 3
3 A 800 NaN 4
4 B 0 100.0 2
5 B 101 300.0 4
6 B 301 800.0 6
7 B 800 NaN 8
df = df1.merge(df2.fillna({'Max':np.inf}), how='left', on='Code')
df3 = (df[df['Val'].between(df['Min'], df['Max']) | df['Min'].isna()]
.drop(['Min','Max'], axis=1))
print (df3)
Code Val Factor
2 A 355 3.0
7 B 9000 8.0
8 C 5 NaN