Home > Back-end >  Get single value count across the whole dataset
Get single value count across the whole dataset

Time:12-12

I'm new at pandas and surveying the internet for my learning I've used count(), value_counts() to count values column wise but now I'm stuck in a problem. I've a car crash reporting dataset that has it's empty value replaced with "Not Reported" so I wanted to count the number of cells that have this value across the whole data set and show it column wise.Is there any way for me to achieve such out come?

The dataset has values like this

| Location     | Severity     | Time         | Outcome      | Substance Used | Traffic Signal |
| --------     | --------     | ----------   | -----------  | -------------- | -------------- |
| New York     | Level 1      | Not Reported | Casualty     | Alcohol        | Red            |
| Texas        | Not Reported |  7:00:00     | Minor Injury | Not Reported   | Green          |
| Not Reported | Level 4      | Not Reported | Not Reported | Smoking        | Yellow         |  

The output required is this.

| Column         | Value        | Count |
| -------------- | ------------ | ----- |
| Location       | Not Reported | 1     |
| Severity       | Not Reported | 1     |
| Time           | Not Reported | 2     |
| Outcome        | Not Reported | 1     |
| Substance Used | Not Reported | 1     |
| Traffic Signal | Not Reported | 0     |

CodePudding user response:

You can use:

(df.where(df.eq('Not Reported')).stack(dropna=False)
   .groupby(level=1).agg(Value='first', Count='count')
   .reset_index()
)

Output:

            index         Value  Count
0        Location  Not Reported      1
1         Outcome  Not Reported      1
2        Severity  Not Reported      1
3  Substance Used  Not Reported      1
4            Time  Not Reported      2
5  Traffic Signal          None      0

CodePudding user response:

You can count Not Reported by compare all values by Not Reported with sum, no groupby necessary:

s = df.eq('Not Reported').sum()
print (s)
Location          1
Severity          1
Time              2
Outcome           1
Substance Used    1
Traffic Signal    0
dtype: int64

Your expected ouput is possible get in DataFrame constructor:

df1 = pd.DataFrame({'Column': s.index, 'Value':'Not Reported', 'Count': s.to_numpy()})
print (df1)
           Column         Value  Count
0        Location  Not Reported      1
1        Severity  Not Reported      1
2            Time  Not Reported      2
3         Outcome  Not Reported      1
4  Substance Used  Not Reported      1
5  Traffic Signal  Not Reported      0
  • Related