If I resume I'm trying to make a Python script that can read symptoms from a medical task given in terminal and compare it to others symptoms in dataset.csv then give what the patient form the task is likely suffering from.
The problem I have is it doesn't seems to read the dataset.csv and just gives me:
The patient is likely suffering from d.
The dataset.csv is like this :
Asthma, Wheezing, coughing, chest tightness, and shortness of breath
Atelectasis, Shortness of breath, chest pain or discomfort, and a cough
Atypical pneumonia, Fever, chills, chest pain or discomfort, and shortness of breath
Basal cell carcinoma, Flat, pale, or yellowish patch of skin
Bell's palsy, Facial droop or weakness, numbness, pain around the jaw
Biliary colic, Pain in the upper abdomen that may spread to the shoulder or back
Bladder cancer, Blood in the urine, pain or burning with urination, and frequent urination
Brain abscess, Headache, fever, confusion, drowsiness, seizures, and weakness
and my script is the following :
#!/usr/bin/env python3
import argparse
import csv
# Parse the command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-t', '--task', help='The symptoms to search for in the dataset')
parser.add_argument('-d', '--dataset', help='The dataset to search in')
args = parser.parse_args()
# Get the task symptoms
task_symptoms = args.task.split(', ')
# Initialize a dictionary to store disease counts
disease_counts = {}
# Open the dataset
try:
# Open the dataset
with open(args.dataset, 'r') as csv_file:
csv_reader = csv.reader('dataset.csv')
# Iterate through each row
for row in csv_reader:
# Get the disease and symptoms
disease = row[0].strip()
symptoms = row[1:]
# Initialize the count
count = 0
# Iterate through each symptom in the task
for task_symptom in task_symptoms:
# Iterate through each symptom in the dataset
for symptom in symptoms:
# If the symptom matches a symptom in the task
if task_symptom == symptom:
# Increment the count
count = 1
# Store the disease name and count in the dictionary
disease_counts[disease] = count
# Get the maximum count
max_count = max(disease_counts.values())
# Get the most probable disease from the counts
most_probable_disease = [k for k, v in disease_counts.items() if v == max_count][0]
print(f'The patient is likely suffering from {most_probable_disease}.')
except FileNotFoundError:
print("Error: Could not open the file.")
What I do wrong ?
An example of what I except is (depending on the symptom):
The patient is likely suffering from Asthma
It has been 3 weeks but I can't figure it out.
Thank you for helping me
CodePudding user response:
I believe the problem is the format of the csv file.
Asthma, Wheezing, coughing, chest tightness, and shortness of breath
Because there is a space after each comma, this line in the csv file will produce these fields:
row[0] = "Asthma"
row[1] = " Wheezing"
row[2] = " coughing"
row[3] = " chest tightness"
row[4] = " and shortness of breath"
See how all of the fields after the first begin with a space? The string " coughing"
does not match the string "coughing"
.
CodePudding user response:
By default when reading a CSV file using a csv.reader()
, each value is split on a ,
alone. Your CSV contains additional spaces when will be included in the value. For example you could test this by using a CSV file such as:
Asthma,Wheezing,coughing,chest tightness,and shortness of breath
You can though use the skipinitialspace=True
parameter for your csv.reader()
. This would ensure that each symptom
does not start with a space character.
For example:
csv_reader = csv.reader('dataset.csv', skipinitialspace=True)
Alternatively you could ensure there are no additional spaces by using .strip()
for each symptom
:
if task_symptom == symptom.strip():
You probably also want to ensure that your comparison is case insensitive by converting both arguments to lowercase:
if task_symptom.lower() == symptom.strip().lower():