Home > database >  Python ,pandas,data analysis
Python ,pandas,data analysis

Time:02-05

def section_articles():

    Biology = (df2["Section"]=="Biology").sum()
    Chemistry = (df2["Section"]=="Chemistry").sum()
    Computer_Science = (df2["Section"]=="Computer Science").sum()
    Earth_Environment = (df2["Section"]=="Earth & Environment").sum()
    Mathematics = (df2["Section"]=="Mathematics").sum()
    Physics = (df2["Section"]=="Physics").sum()
    Statistics = (df2["Section"]=="Statistics").sum()

    return() 
print ("Biology",Biology)
print ("Chemistry",Chemistry)
print ("Computer_Science",Computer_Science)
print ("Earth_Environment",Earth_Environment)
print ("Mathematics",Mathematics)
print ("Physics",Physics)
print ("Statistics",Statistics)

section_articles()

I am expecting the number of articles in each section butgetting : Biology is not defined as error can someone help me please

CodePudding user response:

The issue is that the variables Biology, Chemistry, etc. are local variables defined inside the section_articles function, so they are not accessible outside of the function. To access the values returned by the function, you need to assign the function's output to a variable:

def section_articles():
    Biology = (df2["Section"]=="Biology").sum()
    Chemistry = (df2["Section"]=="Chemistry").sum()
    Computer_Science = (df2["Section"]=="Computer Science").sum()
    Earth_Environment = (df2["Section"]=="Earth & Environment").sum()
    Mathematics = (df2["Section"]=="Mathematics").sum()
    Physics = (df2["Section"]=="Physics").sum()
    Statistics = (df2["Section"]=="Statistics").sum()

    return (Biology, Chemistry, Computer_Science, Earth_Environment, Mathematics, Physics, Statistics)

section_counts = section_articles()

print ("Biology",section_counts[0])
print ("Chemistry",section_counts[1])
print ("Computer_Science",section_counts[2])
print ("Earth_Environment",section_counts[3])
print ("Mathematics",section_counts[4])
print ("Physics",section_counts[5])
print ("Statistics",section_counts[6])

An optimized version by using a dictionary to store the values of each section and then looping through the dictionary to print the values:

def section_articles():
    sections = {"Biology": (df2["Section"]=="Biology").sum(),
                "Chemistry": (df2["Section"]=="Chemistry").sum(),
                "Computer Science": (df2["Section"]=="Computer Science").sum(),
                "Earth & Environment": (df2["Section"]=="Earth & Environment").sum(),
                "Mathematics": (df2["Section"]=="Mathematics").sum(),
                "Physics": (df2["Section"]=="Physics").sum(),
                "Statistics": (df2["Section"]=="Statistics").sum()}
    return sections

section_counts = section_articles()

for section, count in section_counts.items():
    print(f"{section}: {count}")

CodePudding user response:

Your function returns an empty tuple () so you can't ask for its variables outside it.

One way to to fix the error and reduce visible noise is to make/return a dictionnary and loop:

def section_articles():
    list_of_sections = ["Biology", "Chemistry", "Computer Science",
                        "Earth & Environment", "Mathematics", "Physics", "Statistics"]
    return {k: (df2["Section"] == k).sum() for k in sections}

for k, v in section_articles().items():
    print(k, v)

Another variant :

list_of_sections = ["Biology", "Chemistry", "Computer Science",
                    "Earth & Environment", "Mathematics", "Physics", "Statistics"]

def section_articles(section):
    return (df2[section] == k).sum()


for section in list_of_sections:
    print(section, section_articles(section))
  • Related