Home > OS >  Pandas: Using iloc to count all occurrences of pattern
Pandas: Using iloc to count all occurrences of pattern

Time:09-22

The inputs are a bunch of spreadsheets where the text 'N/A' is used for missing values. I need to document how many occurrences in each column using solely index number as I won't know the name of the columns. The first few columns will always be the same but the last few columns change depending on the type of spreadsheet.

df = pd.DataFrame([[1000, 'Jerry', 'BR1','BR1','N/A'], 
                    [1001, 'N/A', 'N/A', 'BR1','N/A'], 
                    ['N/A', 'N/A', 'BR3', 'BR2','N/A'],
                    [1003, 'Perry','BR4','BR1','N/A']],
                   columns=['ID', 'Name', 'Branch', 'Member of','Status'])

I need to be able to do something like:

df.iloc[:, 10:].contains('N/A').count()

And also it's contrary:

~df.iloc[:, 10:].contains('N/A').count()

It should output something like value_counts

Name    2
Branch   1
Status   4

...etc.

CodePudding user response:

This is an option: It creates a dictionary with value_counts for each column. You can make a list as well zipping the column with the values.

summary = {}

for i in df:
    summary[i] = df[i].value_counts()

For example the Output of Branch column will be:

summary['Branch']
BR1    1
N/A    1
BR3    1
BR4    1
Name: Branch, dtype: int64

CodePudding user response:

You can try this:

import pandas as pd

df = pd.DataFrame([[1000, 'Jerry', 'BR1','BR1','N/A'], 
                    [1001, 'N/A', 'N/A', 'BR1','N/A'], 
                    ['N/A', 'N/A', 'BR3', 'BR2','N/A'],
                    [1003, 'Perry','BR4','BR1','N/A']],
                   columns=['ID', 'Name', 'Branch', 'Member of','Status'])

df.applymap(lambda x: x == 'N/A').sum()

It gives:

ID           1
Name         2
Branch       1
Member of    0
Status       4
dtype: int64
  • Related