Assume the table below
Index | Col1 | Col2 | Col3 |
---|---|---|---|
0 | 10.5 | 2.5 | nan |
1 | s | ||
2 | 2.9 | 3.2 | a |
3 | #VAL | nan | 2 |
4 | 3 | 5.6 | 4 |
Now what I'm trying to get is a summary dataframe which will give me a count of different datatypes/conditions as mentioned above
Index | Col1 | Col2 | Col3 |
---|---|---|---|
Integer/Float | 3 | 3 | 2 |
Blank | 1 | 0 | 1 |
Nan | 0 | 1 | 1 |
Text | 1 | 1 | 1 |
I come from Excel so in Excel conditioning it would be pretty much simple
Integer/Float formula: I would use ISNUMBER and create an array of True and False values and sum the true ones
Blank: I would simply use COUNTIF(Column, "")
Text: Similar to ISNUMBER I would use ISTEXT above.
I have tried searching this on Stack Overflow however the best I've gotten is
pd.DataFrame(df["Col1"].apply(type).value_counts())
This does not however give me the exact output.
I also wanted to check if it was possible to filter out the values basis the above condition and get the fitting cells. e.g. df[Col1==ISTEXT]
CodePudding user response:
Use custom funstion for count each type separately:
def f(x):
a = pd.to_numeric(x, errors='coerce').notna().sum()
b = x.eq('').sum()
c = x.isna().sum()
d = len(x) - (a b c)
return pd.Series([a,b,c,d], ['Integer/Float','Blank','Nan','Text'])
df = df.apply(f)
print (df)
Col1 Col2 Col3
Integer/Float 3 3 2
Blank 1 0 1
Nan 0 1 1
Text 1 1 1