I have a dataframe of a set of voltages at different points in time
edit: thanks for your help! This df was created from a google sheet file where everything was a string. Today was my first time using pandas, I learned:
- setting axis
- filter
- astype
- select_dtypes & head()_to_dict()
//
Measurements Release 5 Release 6 Release 7
0 V0 48 48 50
1 V1 49 51 53
2 V2 50 52 54
3 V3 51 53 55
All voltages are measured at the same point for each new fw release. I want to calculate the mean for each one of these 4 points but can't seem to make it work, though documentation seems simple enough
print(df.mean())
Release 5 12123762.75
Release 6 12128813.25
Release 7 12633863.75
dtype: float64
Not sure where it gets those numbers. I tried using df.loc to get a row and then get the mean
print(df.loc['V1'].mean())
ValueError: No axis named V1 for object type DataFrame
And then
print(df.iloc[1].mean())
TypeError: Could not convert V1495153 to numeric
CodePudding user response:
For me working well your solution, maybe need numeric_only=True
parameter:
print(df.mean(numeric_only=True))
Release 5 49.5
Release 6 51.0
Release 7 53.0
dtype: float64
For last pandas version use DataFrame.select_dtypes
:
df.select_dtypes('number').mean()
If need mean per rows:
df.set_index('Measurements').select_dtypes('number').mean(axis=1)
EDIT: For converting columns to numeric before mean
use:
df.drop('Measurements', axis=1).astype('float').mean(axis=1)
Or if float
failed because bad non numeric values:
(df.drop('Measurements', axis=1)
.apply(pd.to_numeric, errors='coerce')
.mean(axis=1) )
CodePudding user response:
Update
All your columns have the object
dtype:
>>> df.mean()
Release 5 12123762.75
Release 6 12128813.25
Release 7 12633863.75
dtype: float64
>>> df.dtypes
Measurements object
Release 5 object
Release 6 object
Release 7 object
dtype: object
Convert your columns to numeric first
df['mean'] = df.filter(like='Release').astype(float).mean(axis=1)
print(df)
# Output
Measurements Release 5 Release 6 Release 7 mean
0 V0 48 48 50 48.666667
1 V1 49 51 53 51.000000
2 V2 50 52 54 52.000000
3 V3 51 53 55 53.000000
Old answer
You can also define Measurements column as the index of your dataframe. It makes sense if you have only numeric columns after that:
out = df.set_index('Measurements').mean(axis=1)
print(out)
# Output
Measurements
V0 48.666667
V1 51.000000
V2 52.000000
V3 53.000000
dtype: float64