how do i check how many type of observation in columns in pandas-CodePudding

I have a column that should only contain either "a" or "b",

how do I check if there are any other inputs in that columns

ps: I think in R it uses this

table(df$column_name)

how can I achieve similar output in pandas

CodePudding user response：

I think you can use groupby() followed by size()

import pandas as pd

data = [
    {"colA": "John", "colB": "a"},
    {"colA": "Jane", "colB": "b"},
    {"colA": "Bob", "colB": "c"},
    {"colA": "Rob", "colB": "a"},
    {"colA": "Hobb", "colB": "b"},
    {"colA": "Greg", "colB": "b"},
    {"colA": "Jennie", "colB": "a"},
    {"colA": "Joe", "colB": "a"},
    {"colA": "Howard", "colB": "x"},
    {"colA": "Dave", "colB": "a"},
]

dataframe = pd.DataFrame(data)

print(dataframe.groupby("colB").size())

Output:

colB
a    5
b    3
c    1
x    1
dtype: int64

CodePudding user response：

Assuming there no NaN values in your column

 df["your column name"].value_counts() #this gives you the unique values and how many times they have occured in your column.

df["your column name"].nunique() #this only gives you the number of unique values.

to check if your column has NaN values

df["your column name"].isna().sum()

Hope this helps.

CodePudding user response：

You can use:

df['column_name'].isin(['a', 'b']).all()

If will output True is all values are either a or b.

If you want to see which values are incorrect:

df[~df['column_name'].isin(['a', 'b'])]

To do both you can save the mask in a variable:

m = df['column_name'].isin(['a', 'b'])
print(m.all())

df[~m]