Figuring out if an entire column in a Pandas dataframe is the same value or not-CodePudding

I have a pandas dataframe that works just fine. I am trying to figure out how to tell if a column with a label that I know if correct does not contain all the same values.

The code

below errors out for some reason when I want to see if the column contains -1 in each cell
# column = "TheColumnLabelThatIsCorrect"
# df = "my correct dataframe"

# I get an () takes 1 or 2 arguments but 3 is passed in error    
if (not df.loc(column, estimate.eq(-1).all())):

I just learned about .eq() and .all() and hopefully I am using them correctly.

CodePudding user response：

It's a syntax issue - see docs for .loc/indexing. Specifically, you want to be using [] instead of ()

You can do something like

if not df[column].eq(-1).all():
    ...

If you want to use .loc specifically, you'd do something similar:

if not df.loc[:, column].eq(-1).all():
    ...

Also, note you don't need to use .eq(), you can just do (df[column] == -1).all()) if you prefer.

CodePudding user response：

You could drop duplicates and if you get only one record it means all records are the same.

import pandas as pd
df = pd.DataFrame({'col': [1, 1, 1, 1]})
len(df['col'].drop_duplicates()) == 1
> True

CodePudding user response：

Question not as clear. Lets try the following though

Contains only -1 in each cell

df['estimate'].eq(-1).all()

Contains -1 in any cell

df['estimate'].eq(-1).any()

Filter out -1 and all columns

df.loc[df['estimate'].eq(-1),:]

CodePudding user response：

df['column'].value_counts() gives you a list of all unique values and their counts in a column. As for checking if all the values are a specific number, you can do that by dropping duplicates and checking the length to be 1.

len(set(df['column'])) == 1