iterate over a text file and store the lowest value in a dictionary-CodePudding

I have a very large text file (Summary_post_docking.txt) and I want to filter it to find the lowest scores. This is what I came up with:

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            score = float(line.split()[2])
            frag_name = str(line.split()[0].split('/')[9]).split('_')[0]
            if 0 >= score >= -200:
                self.results[frag_name] = score
                old = self.results[frag_name]
            if frag_name in self.results.keys():
                new = float(line.split()[2])
                if new < old:
                    self.results[frag_name] = new

    print(self.results)

Unfortunately all this does is taking the last value it reads and doesn't override with the new lower value.

[str(line.split()[0].split('/')[9]).split('_')[0]] is the name of the molecule, while float(line.split()[2]) is the score associated with it.

I want the script to store the name of the molecule as key and the score as a value. For every line, everytime it finds a lower score with the same key I want it to upgrade the value to the smallest it finds.

EDIT:

I'm including a few lines from the txt file:

/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose1       SCORE_sum: -70.13763978228677   avg_score: -0.7 SD_score: 0.44  avg_GBSA: -5.92 SD_GBSA: 2.96   avg_RMSD: 9.75  SD_RMSD: 3.49
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose2       SCORE_sum: -18.39638945104759   avg_score: -0.18    SD_score: 0.26  avg_GBSA: -5.2  SD_GBSA: 4.57   avg_RMSD: 34.57 SD_RMSD: 9.29
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose3       SCORE_sum: -206.23402454507794  avg_score: -2.06    SD_score: 1.15  avg_GBSA: -6.8  SD_GBSA: 1.66   avg_RMSD: 4.05  SD_RMSD: 1.73
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose4       SCORE_sum: -27.56483931516906   avg_score: -0.28    SD_score: 0.64  avg_GBSA: -2.2  SD_GBSA: 3.13   avg_RMSD: 15.43 SD_RMSD: 6.74

I have updated the code as suggested! The script needs to update the value associated with the key to the lowest score it finds.

CodePudding user response：

Your old value could be equal None, and... is it logical to manage the old value according to the molecule? You don't do that.

class Ranker:
    def __init__(self):
        self.results = {}
        with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
            for line in summary:
                molecule_score = float(line.split()[2])
                molecule_name = str(line.split()[0].split('/')[9]).split('_')[0]
                if molecule_name not in self.results:
                    self.results[molecule_name] = score
                elif self.results[molecule_name] > score:
                    self.results[molecule_name] = score

CodePudding user response：

Solved!

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            self.set_score(line)

    self.sorted = dict(sorted(self.results.items(), key=lambda item: item[1]))
    print(self.sorted)

def set_score(self, line):
    new_score = float(line.split()[2])
    frag_name = str(line.split()[0].split('/')[9]).split('_')[0]

    if not (0 >= new_score >= -250):
        return

    if frag_name in self.results.keys():
        old_score = self.results[frag_name]
        if new_score > old_score:
            return

    self.results[frag_name] = new_score