Home > database >  Returning a value from dataframe using columns as a range
Returning a value from dataframe using columns as a range

Time:12-08

Factor Dataframe:

Code Min Max Factor
A 0 100 1
A 101 300 2
A 301 800 3
A 800 NaN 4
B 0 100 2
B 101 300 4
B 301 800 6
B 800 NaN 8

The table above is used to return a factor based on two values, the 'Code' and a specific number. A specific value is provided, such as 5, 302, 253, 8000 and if that value falls between the min and max, the factor is returned.

For example:

  • A, 355 = 3
  • B, 9000 = 8

I have a separate table with several thousand lines with values which need to have the factor assigned.

Dataframe:

Item Code Value
001 A 1
002 A 322
003 B 21
004 A 342
005 A 32
006 B 7666
007 B 10000
008 B 86

How do you use the pandas tool set to return the desired factor and append the relevant factor as a new column?

CodePudding user response:

Use left join in DataFrame.merge with replace missing value to np.inf and then filter in Series.between and possible missing values in Min, if no match in boolean indexing:

print (df1)
  Code   Val
0    A   355
1    B  9000
2    C     5

print (df2)
  Code  Min    Max  Factor
0    A    0  100.0       1
1    A  101  300.0       2
2    A  301  800.0       3
3    A  800    NaN       4
4    B    0  100.0       2
5    B  101  300.0       4
6    B  301  800.0       6
7    B  800    NaN       8

df = df1.merge(df2.fillna({'Max':np.inf}), how='left', on='Code')

df3 = (df[df['Val'].between(df['Min'], df['Max']) | df['Min'].isna()]
             .drop(['Min','Max'], axis=1))
print (df3)
  Code   Val  Factor
2    A   355     3.0
7    B  9000     8.0
8    C     5     NaN
  • Related