Home > database >  How Do I Return a Different Value When Iterating Over a List of Lists
How Do I Return a Different Value When Iterating Over a List of Lists

Time:05-12

ISSUE

I have a FOR loop that creates a list of lists where each entry consists of the input and associated output. I can't figure out how to iterate over the outputs after the list is created and return the corresponding input. I was able to solve my problem by converting the list into a dataframe and use .loc[], but I'm stubborn and want to produce the same result without having to perform the conversion to a dataframe. I also do not want to convert this into a dictionary, I have already solved for that case as well.

I have included the list that is produced as well as the converted dataframe that works. In this case best_tree_size should return 100 as it's output was the minimum result.

CURRENT CODE THAT WORKS

    candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]

    #list placeholder for loop calculation
    leaf_list = []

    #Write loop to find the ideal tree size from candidate_max_leaf_nodes
    for max_leaf_nodes in candidate_max_leaf_nodes:
        #each iteration outputs a 2 item list [leaf, MAE], which appends to leaf_list as an array
        leaf_list.append([max_leaf_nodes, get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)])

    #convert array into dataframe
    scores = pd.DataFrame(leaf_list, columns =['Leaf', 'MAE'])

    #Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
    #idxmin() is finding the min value of MAE and returning the dataframe index
    #.loc is utilizing the index from idxmin() and returning the corresponding value from Leaf that caused it
    best_tree_size = scores.loc[scores.MAE.idxmin(), 'Leaf']

    #clear list placeholder (if needed)
    leaf_list.clear()

PRODUCED leaf_list

[[5, 35044.51299744237],

[25, 29016.41319191076],

[50, 27405.930473214907],

[100, 27282.50803885739],

[250, 27893.822225701646],

[500, 29454.18598068598]]

CONVERTED scores DATAFRAME

scores

CodePudding user response:

So you have a list of [leaf, MAE] and you want to get the item from that list with the minimum MAE? You can do it like this:

scores = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]

from operator import itemgetter
best_leaf = min(scores, key=itemgetter(1))

# beaf_leaf will be equal to [100, 27282.50803885739]

The key here is itemgetter(1) which returns a method that, when passed a tuple or a list, returns the element at index 1 (here, the MAE). We use that as key to min(), so that elements are compared based on their MAE value.

CodePudding user response:

Numpy style:

leaf_list = [
[5, 35044.51299744237],
[25, 29016.41319191076],
[50, 27405.930473214907],
[100, 27282.50803885739],
[250, 27893.822225701646],
[500, 29454.18598068598]
]
# to numpy
leaf_list = np.array(leaf_list)
# reduce dimension
flatten = leaf_list.flatten()
# def. cond. (check every second item (output) and find min value index
index = np.where(flatten == flatten[1::2].min())[0]//2
# output list
out_list = leaf_list[index]

Output:

array([[  100.        , 27282.50803886]])

Also multiple min values (same num.):

leaf_list = [[14,  6],  
             [25,  55],   
             [5,   6]]

#... same code

Output:

array([[14,  6],
       [ 5,  6]])
  • Related