The inputs are a bunch of spreadsheets where the text 'N/A' is used for missing values. I need to document how many occurrences in each column using solely index number as I won't know the name of the columns. The first few columns will always be the same but the last few columns change depending on the type of spreadsheet.
df = pd.DataFrame([[1000, 'Jerry', 'BR1','BR1','N/A'],
[1001, 'N/A', 'N/A', 'BR1','N/A'],
['N/A', 'N/A', 'BR3', 'BR2','N/A'],
[1003, 'Perry','BR4','BR1','N/A']],
columns=['ID', 'Name', 'Branch', 'Member of','Status'])
I need to be able to do something like:
df.iloc[:, 10:].contains('N/A').count()
And also it's contrary:
~df.iloc[:, 10:].contains('N/A').count()
It should output something like value_counts
Name 2
Branch 1
Status 4
...etc.
CodePudding user response:
This is an option: It creates a dictionary with value_counts for each column. You can make a list as well zipping the column with the values.
summary = {}
for i in df:
summary[i] = df[i].value_counts()
For example the Output of Branch column will be:
summary['Branch']
BR1 1
N/A 1
BR3 1
BR4 1
Name: Branch, dtype: int64
CodePudding user response:
You can try this:
import pandas as pd
df = pd.DataFrame([[1000, 'Jerry', 'BR1','BR1','N/A'],
[1001, 'N/A', 'N/A', 'BR1','N/A'],
['N/A', 'N/A', 'BR3', 'BR2','N/A'],
[1003, 'Perry','BR4','BR1','N/A']],
columns=['ID', 'Name', 'Branch', 'Member of','Status'])
df.applymap(lambda x: x == 'N/A').sum()
It gives:
ID 1
Name 2
Branch 1
Member of 0
Status 4
dtype: int64