Home > Blockchain >  Customised data summary table
Customised data summary table

Time:10-21

I am attempting to create a data summary table that has three columns. One of which shows the attribute's name, one of which shows the most frequently occurring attribute-value, and the other shows the count of that most frequently occurring value.

I have the following data:

data = {"Sex":["M", "M", "F", "F", "F", "F", "M", "F", "F",  "M"],
    "Product": ["X", "Y", "Y", "Z","Y", "Y", "Y", "X", "Z", "Y"],
    "Answer":["Yes", "Yes", "Yes", "No",  "No", "Yes", "Yes", "No",  "Yes", "Yes"]}

df = pd.DataFrame(data)

I then do the following:

stat = dict([(column, [])
    for column in df
        if df[column].dtype == object])
for column in df:
    stat[column] = df[column].value_counts()[
        df[column].value_counts() == df[column].value_counts().max()]
print(stat)

I get the following output

{'Sex': F    6
Name: Sex, dtype: int64, 'Product': Y    6
Name: Product, dtype: int64, 'Answer': Yes    7
Name: Answer, dtype: int64}

How can I create a table that looks something like this?

Attribute Most Frequently Occurring Value Count
Sex F 6
Product Y 6
Answer Yes 7

Could I do it using a library like tabulate or plotly?

CodePudding user response:

If you're looping through a DataFrame, you're probably doing it wrong.

# Use `mode` to get the most frequent of each column.
most_frequent = df.mode().squeeze().rename('Most Frequently Occurring Value')

# Get the counts of those with a mask.
counts = df[df.eq(most_frequent)].count().rename('Count')

# Combine the results with `concat`
out = pd.concat([most_frequent, counts], axis=1)
print(out)

Output:

        Most Frequently Occurring Value  Count
Sex                                   F      6
Product                               Y      6
Answer                              Yes      7

Make it pretty:

out.index.name = 'Attribute'
print(out.to_markdown())

# Output:

| Attribute   | Most Frequently Occurring Value   |   Count |
|:------------|:----------------------------------|--------:|
| Sex         | F                                 |       6 |
| Product     | Y                                 |       6 |
| Answer      | Yes                               |       7 |

Which looks like this in markdown:

Attribute Most Frequently Occurring Value Count
Sex F 6
Product Y 6
Answer Yes 7

CodePudding user response:

For better understanding I am gonna chop your question into pieces.

First, to get The attribute and count, columns

Groupby level = 0, axis=1 and select the most frequent value either using value_counts indexed, or mode

df2 = df.groupby(level=0,axis=1).apply(lambda x : x.value_counts()[0])

Unfortunately, for you pandas doesn't like group by axis = 1. So I cant do this with a single line of code using aggreagate, so for the Most Frequently Occurring Value column. Well a simple

df.mode(axis=0) # Will return your wanted series

Nonetheless good question congrats

  • Related