I have a column that should only contain either "a" or "b",
how do I check if there are any other inputs in that columns
ps: I think in R it uses this
table(df$column_name)
how can I achieve similar output in pandas
CodePudding user response:
I think you can use groupby()
followed by size()
import pandas as pd
data = [
{"colA": "John", "colB": "a"},
{"colA": "Jane", "colB": "b"},
{"colA": "Bob", "colB": "c"},
{"colA": "Rob", "colB": "a"},
{"colA": "Hobb", "colB": "b"},
{"colA": "Greg", "colB": "b"},
{"colA": "Jennie", "colB": "a"},
{"colA": "Joe", "colB": "a"},
{"colA": "Howard", "colB": "x"},
{"colA": "Dave", "colB": "a"},
]
dataframe = pd.DataFrame(data)
print(dataframe.groupby("colB").size())
Output:
colB
a 5
b 3
c 1
x 1
dtype: int64
CodePudding user response:
Assuming there no NaN values in your column
df["your column name"].value_counts() #this gives you the unique values and how many times they have occured in your column.
or
df["your column name"].nunique() #this only gives you the number of unique values.
to check if your column has NaN values
df["your column name"].isna().sum()
Hope this helps.
CodePudding user response:
You can use:
df['column_name'].isin(['a', 'b']).all()
If will output True
is all values are either a
or b
.
If you want to see which values are incorrect:
df[~df['column_name'].isin(['a', 'b'])]
To do both you can save the mask in a variable:
m = df['column_name'].isin(['a', 'b'])
print(m.all())
df[~m]