Supposed I have the following table, which is comprised of 10,000 items, this is a sample of the data.
Item Rank | Item Cost |
---|---|
1 | 1000 |
2 | 800 |
3 | 900 |
4 | 500 |
5 | 400 |
6 | 200 |
Of course the variation is very different along the list, some items are higher ranked but are cheaper than others, some items are more expensive but lower ranked than others.
I can't find the term or keyword or concept to find the HIGHEST ranked item that has the LOWEST price, to find the optimal balance of price to rank, so that if i'm making the choice of a list i'm saving the most money while getting the highest rank possible.
Only thing I can think of right now is grouping each 10 ranks and sorting them ascending by price but there has to be a more "statistical" way.
I'm trying to implement this in python, Thank you!
I'd appreciate any help.
Edit: One idea is find the biggest variance where index in price set < index in rank set?
CodePudding user response:
I understood your question a little differently so I'll provide an alternative answer.
Since you haven't worked out what you want to calculate yet, this seems to be not so much a question about Python but about the underlying math/statistics. I think the way to approach this is to find an equation that will give you so a measure of value based on rank and cost, and then implement that in Python using basic arithmetic operators.
The equation should reward low ranks and penalise high costs, so it might look something like:
value = cost - A * ( 1 / rank )
where A is some integer determined based on your use case and the data.
I would suggest playing around in Excel to get your equation before implementing in Python. I tried this and for A = 100 you get some realistic looking values, but the item in rank 1 with price 1000 still comes out on top, which suggest that too much weight is assigned to rank relative to cost (in other words, there is not enough penalty for high cost). Adjusting A to 500 makes the item with rank 3 come out on top, which is the one I would have picked out instinctively looking at the data, so this may be closer to the value you want.
Item rank | Item cost | rank - 100/cost | rank - 500/cost |
---|---|---|---|
1 | 1000 | 900.00 | 500.00 |
2 | 800 | 750.00 | 550.00 |
3 | 900 | 866.67 | 733.33 |
4 | 500 | 475.00 | 375.00 |
5 | 400 | 380.00 | 300.00 |
6 | 200 | 183.33 | 116.67 |
CodePudding user response:
Your goal is a little unclear to me, but it sounds like you are looking for the highest ranked item that you can get for less than the next lower ranked item? So in the example this would be item rank #2, as it is both higher rank and lower cost than item rank #3.
Assuming this is the intended goal, then what you are looking for is the last element in a strictly decreasing sequence, starting with the cost of item rank 1, and progressing in rank order to the lowest ranked item. The first item encountered that is lower in cost than the next item signifies the end of the strictly decreasing sequence, and would be the target value. Since the searched sequence always starts with the highest ranked item, the returned value should always be the highest rank that is cheaper than the next lower rank.
items = [
('1','1000'),
('2','800'),
('3','900'),
('4','500'),
('5','400'),
('6','200')
]
def getCheapestHighestRank(items):
for i in range(1, len(items)):
if items[i][1] >= items[i-1][1]:
return items[i-1]
print(getCheapestHighestRank(items))
--- Code to illustrate valuation based on maximizing savings per rank sacrificed as per comments---
Effectively, this approach looks for the item that offers the highest compensation per rank lost vs going with the highest rank option. If you run the code, the output will show that item #2 provides a savings of 200 per rank lost, where as item rank 100 only provides 10 savings per rank lost. This approach makes sense if rank between items is an absolute measure, rather than a % difference. If ranks reflect % of quality, that rank #2 could mean that the item is half as good as rank #1, and you only receive 200 in savings for 1/2 the quality. The question of how to value the ranked items is interesting, but ultimately I think more context about what "best value" means might be needed to give more concrete answers.
items = [
(1,1000),
(2,800),
(3,900),
(4,500),
(5,400),
(6,200),
(100, 2)
]
maxRankItem = items[0]
for i in range(1, len(items)):
print("")
print((maxRankItem[1] - items[i][1]), "/", (items[i][0] - maxRankItem[0]))
print("saving: ", (maxRankItem[1] - items[i][1]) / (items[i][0] - maxRankItem[0]), "for each rank lost")