I want to create a histogram using one column of my data and then find what is the index of elements classified in each bin.
The sample table is as follows:
name | n_7 | n_6 |
---|---|---|
a | 20 | 11 |
b | 14 | 50 |
c | 18 | 21 |
d | 11 | 4 |
I want to make a histogram based on column n_7
and then find which names are included in each bin of my histogram. For example, the first bin will include 280 elements of the n_7
column and I want to know what is the name of those elements based on the name column.
The following code creates the histogram:
counts, binEdges = np.histogram(df2.n_7,bins=6)
CodePudding user response:
To find the names of the elements in each bin of the histogram, you can use the pandas.cut function to create bins for the n_7 column and add a new column to the DataFrame that indicates the bin to which each element belongs. Here's an example of how you could do this:
import pandas as pd
# Create bins for the n_7 column
df2['n_7_bin'] = pd.cut(df2['n_7'], bins=6)
Or if you prefer NumPy
import numpy as np
# Find the indices of the bins to which each element belongs
indices = np.digitize(df2['n_7'], bins=binEdges)
# Use the indices to select the elements in each bin
binned_elements = df2.iloc[indices]
print(binned_elements)
# Group the DataFrame by the n_7_bin column and use the `apply` method to get a list of the names for each bin
binned_names = df2.groupby('n_7_bin')['name'].apply(list)
print(binned_names)