I have a pandas dataframe that works just fine. I am trying to figure out how to tell if a column with a label that I know if correct does not contain all the same values.
The code
below errors out for some reason when I want to see if the column contains -1 in each cell
# column = "TheColumnLabelThatIsCorrect"
# df = "my correct dataframe"
# I get an () takes 1 or 2 arguments but 3 is passed in error
if (not df.loc(column, estimate.eq(-1).all())):
I just learned about .eq() and .all() and hopefully I am using them correctly.
CodePudding user response:
It's a syntax issue - see docs for .loc
/indexing. Specifically, you want to be using []
instead of ()
You can do something like
if not df[column].eq(-1).all():
...
If you want to use .loc
specifically, you'd do something similar:
if not df.loc[:, column].eq(-1).all():
...
Also, note you don't need to use .eq()
, you can just do (df[column] == -1).all())
if you prefer.
CodePudding user response:
You could drop duplicates and if you get only one record it means all records are the same.
import pandas as pd
df = pd.DataFrame({'col': [1, 1, 1, 1]})
len(df['col'].drop_duplicates()) == 1
> True
CodePudding user response:
Question not as clear. Lets try the following though
Contains only -1 in each cell
df['estimate'].eq(-1).all()
Contains -1 in any cell
df['estimate'].eq(-1).any()
Filter out -1 and all columns
df.loc[df['estimate'].eq(-1),:]
CodePudding user response:
df['column'].value_counts()
gives you a list of all unique values and their counts in a column. As for checking if all the values are a specific number, you can do that by dropping duplicates and checking the length to be 1.
len(set(df['column'])) == 1