I have 2 functions that read a csv file and count the following as checks:
- number of rows in that csv
- number of rows that have a null value in the 'ID' column
I am trying to create a dataframe that looks like this
Checks | Summary | Findings |
---|---|---|
Check #1 | Number of records on file | function #1 results (Number of records on file: 10) |
Check #2 | Number of records missing an ID | function #2 results (Number of records missing an ID: 2) |
function 1 looks like this:
def function1():
with open('data.csv') as file:
record_number = len(list(file))
print("Number of records on file:",record_number)
function1()
and outputs "Number of records on file: 10"
function 2 looks like this:
def function2():
df = pd.read_csv('data.csv', low_memory=False)
missing_id = df["IDs"].isna().sum()
print("Number of records missing an ID:", missing_id)
function2()
and outputs "Number of records missing an ID: 2"
I attempt to create a dictionary first and create my dictionary
table = {
'Checks' : ['Check #1', 'Check #2'],
'Summary' : ['Number of records on file', 'Number of records missing an ID'],
'Findings' : [function1, function2]
}
df = pd.DataFrame(table)
df
However, this is what the dataframe looks like:
Checks | Summary | Findings |
---|---|---|
Check #1 | Number of records on file | <function function1 at 0x7efd2d76a730> |
Check #2 | Number of records missing an ID | <function2 at 0x7efd25cd0b70> |
Is there any way to make it so that my Findings column outputs the actual results as seen above?
CodePudding user response:
The reason is that you're printing the function objects, and not their results:
function1 != function1()
So for your case you need:
table = {
'Checks' : ['Check #1', 'Check #2'],
'Summary' : ['Number of records on file', 'Number of records missing an ID'],
'Findings' : [function1(), function2()]
}
df = pd.DataFrame(table)
df
Edit: Oh damn and I also missed what the other user commented. You definitely need to return
a value from your functions as well :)
CodePudding user response:
You need to change your functions so they return
values, not output them, that is do
def function1():
with open('data.csv') as file:
record_number = len(list(file))
return record_number
and
def function2():
df = pd.read_csv('data.csv', low_memory=False)
return df["IDs"].isna().sum()
and call these functions like so
table = {
'Checks' : ['Check #1', 'Check #2'],
'Summary' : ['Number of records on file', 'Number of records missing an ID'],
'Findings' : [function1(), function2()]
}
df = pd.DataFrame(table)
df
CodePudding user response:
For expected ouput add return
with f-strings
to both functions, in DataFrame call functions with parentheses:
def function1():
with open('data.csv') as file:
record_number = len(list(file))
return f"function #1 results (Number of records on file: {record_number})")
def function2():
df = pd.read_csv('data.csv', low_memory=False)
missing_id = df["IDs"].isna().sum()
return f"function #2 results (Number of records missing an ID: {missing_id})")
table = {
'Checks' : ['Check #1', 'Check #2'],
'Summary' : ['Number of records on file', 'Number of records missing an ID'],
'Findings' : [function1(), function2()]
}
df = pd.DataFrame(table)
Solution with one function:
def function():
with open('data.csv') as file:
record_number = len(list(file))
missing_id = df["IDs"].isna().sum()
return [f"function #1 results (Number of records on file: {record_number})"),
f"function #2 results (Number of records missing an ID: {missing_id})")]
table = {
'Checks' : ['Check #1', 'Check #2'],
'Summary' : ['Number of records on file', 'Number of records missing an ID'],
'Findings' : function()
}
df = pd.DataFrame(table)