Trouble matching strings using Python and CSV-CodePudding

If I resume I'm trying to make a Python script that can read symptoms from a medical task given in terminal and compare it to others symptoms in dataset.csv then give what the patient form the task is likely suffering from.

The problem I have is it doesn't seems to read the dataset.csv and just gives me:

The patient is likely suffering from d.

The dataset.csv is like this :

Asthma, Wheezing, coughing, chest tightness, and shortness of breath
Atelectasis, Shortness of breath, chest pain or discomfort, and a cough
Atypical pneumonia, Fever, chills, chest pain or discomfort, and shortness of breath
Basal cell carcinoma, Flat, pale, or yellowish patch of skin
Bell's palsy, Facial droop or weakness, numbness, pain around the jaw
Biliary colic, Pain in the upper abdomen that may spread to the shoulder or back
Bladder cancer, Blood in the urine, pain or burning with urination, and frequent urination
Brain abscess, Headache, fever, confusion, drowsiness, seizures, and weakness

and my script is the following :

#!/usr/bin/env python3

import argparse
import csv

# Parse the command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-t', '--task', help='The symptoms to search for in the dataset')
parser.add_argument('-d', '--dataset', help='The dataset to search in')
args = parser.parse_args()

# Get the task symptoms
task_symptoms = args.task.split(', ')

# Initialize a dictionary to store disease counts
disease_counts = {}

# Open the dataset
try:
    # Open the dataset
    with open(args.dataset, 'r') as csv_file:
        csv_reader = csv.reader('dataset.csv')

# Iterate through each row
    for row in csv_reader:
        
        # Get the disease and symptoms
        disease = row[0].strip()
        symptoms = row[1:]
        
        # Initialize the count
        count = 0
        
        # Iterate through each symptom in the task
        for task_symptom in task_symptoms:
            
            # Iterate through each symptom in the dataset
            for symptom in symptoms:

                # If the symptom matches a symptom in the task
                if task_symptom == symptom:
                    
                    # Increment the count
                    count  = 1

        # Store the disease name and count in the dictionary
        disease_counts[disease] = count
# Get the maximum count
    max_count = max(disease_counts.values())

    # Get the most probable disease from the counts
    most_probable_disease = [k for k, v in disease_counts.items() if v == max_count][0]

    print(f'The patient is likely suffering from {most_probable_disease}.')

except FileNotFoundError:
    print("Error: Could not open the file.")

What I do wrong ?

An example of what I except is (depending on the symptom):

The patient is likely suffering from Asthma

It has been 3 weeks but I can't figure it out.

Thank you for helping me

CodePudding user response：

I believe the problem is the format of the csv file.

Asthma, Wheezing, coughing, chest tightness, and shortness of breath

Because there is a space after each comma, this line in the csv file will produce these fields:

row[0] = "Asthma"
row[1] = " Wheezing"
row[2] = " coughing"
row[3] = " chest tightness"
row[4] = " and shortness of breath"

See how all of the fields after the first begin with a space? The string " coughing" does not match the string "coughing".

CodePudding user response：

By default when reading a CSV file using a csv.reader(), each value is split on a , alone. Your CSV contains additional spaces when will be included in the value. For example you could test this by using a CSV file such as:

Asthma,Wheezing,coughing,chest tightness,and shortness of breath

You can though use the skipinitialspace=True parameter for your csv.reader(). This would ensure that each symptom does not start with a space character.

For example:

csv_reader = csv.reader('dataset.csv', skipinitialspace=True)

Alternatively you could ensure there are no additional spaces by using .strip() for each symptom:

if task_symptom == symptom.strip():

You probably also want to ensure that your comparison is case insensitive by converting both arguments to lowercase:

if task_symptom.lower() == symptom.strip().lower():