Find the index of elements calssified in each bin of histogram-CodePudding

I want to create a histogram using one column of my data and then find what is the index of elements classified in each bin.

The sample table is as follows:

name	n_7	n_6
a	20	11
b	14	50
c	18	21
d	11	4

I want to make a histogram based on column n_7 and then find which names are included in each bin of my histogram. For example, the first bin will include 280 elements of the n_7 column and I want to know what is the name of those elements based on the name column.

The following code creates the histogram:

counts, binEdges = np.histogram(df2.n_7,bins=6)

CodePudding user response：

To find the names of the elements in each bin of the histogram, you can use the pandas.cut function to create bins for the n_7 column and add a new column to the DataFrame that indicates the bin to which each element belongs. Here's an example of how you could do this:

import pandas as pd

# Create bins for the n_7 column
df2['n_7_bin'] = pd.cut(df2['n_7'], bins=6)

Or if you prefer NumPy

import numpy as np

# Find the indices of the bins to which each element belongs
indices = np.digitize(df2['n_7'], bins=binEdges)

# Use the indices to select the elements in each bin
binned_elements = df2.iloc[indices]

print(binned_elements)


# Group the DataFrame by the n_7_bin column and use the `apply` method to get a list of the names for each bin
binned_names = df2.groupby('n_7_bin')['name'].apply(list)

print(binned_names)