Home > database >  Get the count of Text, Numeric/Float, Blank and Nan values for each column in a Dataframe and extrac
Get the count of Text, Numeric/Float, Blank and Nan values for each column in a Dataframe and extrac

Time:11-25

Assume the table below

Index Col1 Col2 Col3
0 10.5 2.5 nan
1 s
2 2.9 3.2 a
3 #VAL nan 2
4 3 5.6 4

Now what I'm trying to get is a summary dataframe which will give me a count of different datatypes/conditions as mentioned above

Index Col1 Col2 Col3
Integer/Float 3 3 2
Blank 1 0 1
Nan 0 1 1
Text 1 1 1

I come from Excel so in Excel conditioning it would be pretty much simple

Integer/Float formula: I would use ISNUMBER and create an array of True and False values and sum the true ones
Blank: I would simply use COUNTIF(Column, "")
Text: Similar to ISNUMBER I would use ISTEXT above.

I have tried searching this on Stack Overflow however the best I've gotten is

pd.DataFrame(df["Col1"].apply(type).value_counts())

This does not however give me the exact output.

I also wanted to check if it was possible to filter out the values basis the above condition and get the fitting cells. e.g. df[Col1==ISTEXT]

CodePudding user response:

Use custom funstion for count each type separately:

def f(x):
    a = pd.to_numeric(x, errors='coerce').notna().sum()
    b = x.eq('').sum()
    c = x.isna().sum()
    d = len(x) - (a   b   c)
    return pd.Series([a,b,c,d], ['Integer/Float','Blank','Nan','Text'])

df = df.apply(f)
print (df)
               Col1  Col2  Col3
Integer/Float     3     3     2
Blank             1     0     1
Nan               0     1     1
Text              1     1     1
  • Related