ISSUE
I have a FOR loop that creates a list of lists where each entry consists of the input and associated output. I can't figure out how to iterate over the outputs after the list is created and return the corresponding input. I was able to solve my problem by converting the list into a dataframe and use .loc[], but I'm stubborn and want to produce the same result without having to perform the conversion to a dataframe. I also do not want to convert this into a dictionary, I have already solved for that case as well.
I have included the list that is produced as well as the converted dataframe that works. In this case best_tree_size should return 100 as it's output was the minimum result.
CURRENT CODE THAT WORKS
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
#list placeholder for loop calculation
leaf_list = []
#Write loop to find the ideal tree size from candidate_max_leaf_nodes
for max_leaf_nodes in candidate_max_leaf_nodes:
#each iteration outputs a 2 item list [leaf, MAE], which appends to leaf_list as an array
leaf_list.append([max_leaf_nodes, get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)])
#convert array into dataframe
scores = pd.DataFrame(leaf_list, columns =['Leaf', 'MAE'])
#Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
#idxmin() is finding the min value of MAE and returning the dataframe index
#.loc is utilizing the index from idxmin() and returning the corresponding value from Leaf that caused it
best_tree_size = scores.loc[scores.MAE.idxmin(), 'Leaf']
#clear list placeholder (if needed)
leaf_list.clear()
PRODUCED leaf_list
[[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]]
CONVERTED scores DATAFRAME
CodePudding user response:
So you have a list of [leaf, MAE] and you want to get the item from that list with the minimum MAE? You can do it like this:
scores = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
from operator import itemgetter
best_leaf = min(scores, key=itemgetter(1))
# beaf_leaf will be equal to [100, 27282.50803885739]
The key here is itemgetter(1)
which returns a method that, when passed a tuple or a list, returns the element at index 1 (here, the MAE).
We use that as key to min()
, so that elements are compared based on their MAE value.
CodePudding user response:
Numpy style:
leaf_list = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
# to numpy
leaf_list = np.array(leaf_list)
# reduce dimension
flatten = leaf_list.flatten()
# def. cond. (check every second item (output) and find min value index
index = np.where(flatten == flatten[1::2].min())[0]//2
# output list
out_list = leaf_list[index]
Output:
array([[ 100. , 27282.50803886]])
Also multiple min values (same num.):
leaf_list = [[14, 6],
[25, 55],
[5, 6]]
#... same code
Output:
array([[14, 6],
[ 5, 6]])