I am trying to write a program that reads in a csv with multiple columns and takes 2 arguments. I then want to use the first and second argument as my start and stop indices to slice the second column (index 1) of my loaded csv and find the mean of that slice. I then want to print that mean formatted as a float with two decimal places. This is what I have tried:
import sys
import pandas as pd
def main():
df = pd.read_csv("*filepath*")
x = int(sys.argv[1])
y = int(sys.argv[2])
result = pd.DataFrame(df.iloc[x:y:1].mean(axis=0))
print("{:.2f}".format(result))
main()
here is the .csv that I am reading in
Here is what I am passing in my terminal:
python3 presidents.py 1 10
But after running in my terminal I keep getting this error?
TypeError: unsupported format string passed to DataFrame.__format__
as well as a future warning?
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
result = pd.DataFrame(df.iloc[x:y:1].mean(axis=0))
CodePudding user response:
You have 2 issues that need resolution....
First, you are getting the error because when you ask a DataFrame
for the mean()
the result is a pandas.Series
, representing the mean of each column. So if you want a scalar (single) value, you need to specify the column of interest.
Second, your data frame likely has string values in one or more columns, that is why you are getting the warning that it is dropping "nuisance columns".
If you are a newer coder, I'd recommend as an exercise ditching pandas
, read your .csv manually and calculate it yourself. Not too hard to read a .csv with a context manager and keep track of the locations of the indices of interest from your arguments... :)
Example:
In [20]: import pandas as pd
In [21]: a = {'c1': [1,2,3], 'c2': list('abc')}
In [22]: df = pd.DataFrame(a)
In [23]: df
Out[23]:
c1 c2
0 1 a
1 2 b
2 3 c
In [24]: df.mean()
<ipython-input-24-c61f0c8f89b5>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
df.mean()
Out[24]:
c1 2.0
dtype: float64
In [25]: type(_)
Out[25]: pandas.core.series.Series
In [26]: df['c1'].mean()
Out[26]: 2.0
In [27]: df['c1'].iloc[0:1].mean()
Out[27]: 1.0
In [28]: df['c1'].iloc[0:2].mean()
Out[28]: 1.5
CodePudding user response:
Thanks all. I ended up doing something a bit different, but it met my needs. Posting my answer here:
import sys
import pandas as pd
import numpy as np
data = pd.read_csv("president_heights.csv") #use complete file path to actually run
def main():
height= np.array(data["height(cm)"])
x=int(sys.argv[1])
y=int(sys.argv[2])
slice = height[x:y]
print(f'The average height of presidents number {x} to {y} is { "{:.2f}".format(slice.mean())}')
main()