I'm having a problem with my code
data = pd.read_table('household_power_consumption.txt',sep=';',
low_memory=False,header=0, index_col=False,
parse_dates=[0])
df = pd.DataFrame(data, dtype=None)
col = df["Global_active_power"]
max_value=col.max()
print(max_value)
This is an image of the dataset enter image description here
As you can see, the column "Global_active_power" is fully occupied with data. However, my max value return a question mark ("?")
I have tried several codes but the value stays the same. Can somebody help me with this
You can get the data from https://archive.ics.uci.edu/ml/datasets/individual household electric power consumption
CodePudding user response:
Probably not all of the rows in column Global_active_power are populated by numerical values. Probably the latest value in the row has missing value and there populated is "?". Because of that entire column is not numerical, and max() returns the latest value from the row.
Check example:
df = pd.DataFrame({"x": [1,2,3], "y": ["32.32", "?","fef"], "z": ["32.32", "456", "?"]})
df
# output
x y z
0 1 32.32 32.32
1 2 ? 456
2 3 fef ?
df.x.max()
# output
3
df.y.max()
# output
'fef'
df.z.max()
# output
'?'
If column is not numerical data type, max() all the time returns the latest value in the column.
CodePudding user response:
The data in the column, as imported, is str
type -- so .max()
isn't meaningful in the sense you intend. It appears that the data is floating-type, so you need to convert it to type float64
, but first replace all the ?
values with NaN
(so the type conversion doesn't fail). That is, try:
col = df['Global_active_power'].apply(lambda x: x if x != '?' else 'NaN')
.astype('float64')
Working example:
import pandas as pd
data = pd.read_table('household_power_consumption.txt',\
sep=';',low_memory=False,\
header=0, index_col=False,parse_dates=[0])
df = pd.DataFrame(data,dtype=None)
col = df['Global_active_power'].apply(lambda x: x if x != '?' else 'NaN')\
.astype('float64')
print(col.max())
>>> 11.122