I have a pandas DataFrame and need to do the following: query the DateFrame, count the number of times the values in a column appear and return another DataFrame, where the first column is the feature name and second column is the number of counts. I can return the result from a series to a DataFrame, but it only gives one column.
See below.
df_1 = pd.DataFrame({'id': ['001', '002', '003', '004', '005', '006', '007', '008'],
'color_value': ['blue', 'red', 'yellow', 'orange',
'blue','red', 'blue', 'orange']})
df_1=
id color_value
001 blue
002 red
003 yellow
004 orange
005 blue
006 red
007 blue
008 orange
num_counts = df_1['color_value'].value_counts()
num_counts =
blue 3
orange 2
red 2
yellow 1
Name: color_value, dtype: int64
# convert from a series to DataFrame
num_counts.to_frame()
color_value
blue 3
orange 2
red 2
yellow 1
I need to convert the above 1-column DataFrame into a 2-column DataFrame, where the column 1 title = feature and column 2 title = counts. I cannot do this manually because the actual matrix is extremely large. The goal is to have something I can plot counts for to see the distribution of the counts for all features.
CodePudding user response:
Just use reset_index
:
>>> df_1['color_value'].value_counts().reset_index()
index color_value
0 blue 3
1 orange 2
2 red 2
3 yellow 1
>>>
If you care about column names:
>>> df_1['color_value'].value_counts().rename_axis('color_value').reset_index(name='count')
color_value count
0 blue 3
1 orange 2
2 red 2
3 yellow 1
>>>