Home > Enterprise >  Query DataFrame, count values and return features and counts as DataFrame [duplicate]
Query DataFrame, count values and return features and counts as DataFrame [duplicate]

Time:09-23

I have a pandas DataFrame and need to do the following: query the DateFrame, count the number of times the values in a column appear and return another DataFrame, where the first column is the feature name and second column is the number of counts. I can return the result from a series to a DataFrame, but it only gives one column.

See below.

df_1 = pd.DataFrame({'id': ['001', '002', '003', '004', '005', '006', '007', '008'],
                     'color_value': ['blue', 'red', 'yellow', 'orange',
                     'blue','red', 'blue', 'orange']})

df_1=

id   color_value
001  blue
002  red
003  yellow
004  orange
005  blue
006  red
007  blue
008  orange

num_counts = df_1['color_value'].value_counts()

num_counts =
blue      3
orange    2
red       2
yellow    1
Name: color_value, dtype: int64

# convert from a series to DataFrame
num_counts.to_frame()

        color_value
blue    3
orange  2
red     2
yellow  1

I need to convert the above 1-column DataFrame into a 2-column DataFrame, where the column 1 title = feature and column 2 title = counts. I cannot do this manually because the actual matrix is extremely large. The goal is to have something I can plot counts for to see the distribution of the counts for all features.

CodePudding user response:

Just use reset_index:

>>> df_1['color_value'].value_counts().reset_index()
    index  color_value
0    blue            3
1  orange            2
2     red            2
3  yellow            1
>>> 

If you care about column names:

>>> df_1['color_value'].value_counts().rename_axis('color_value').reset_index(name='count')
  color_value  count
0        blue      3
1      orange      2
2         red      2
3      yellow      1
>>> 
  • Related