Home > Blockchain >  Assign counts from .count() to a dataframe column names - pandas python
Assign counts from .count() to a dataframe column names - pandas python

Time:10-24

Hoping someone can help me here - i believe i am close to the solution.

I have a dataframe, of which i have am using .count() in order to return a series of all column names of my dataframe, and each of their respective non-NAN value counts.

Example dataframe:

feature_1 feature_2
1 1
2 NaN
3 2
4 NaN
5 3

Example result for .count() here would output a series that looks like:

feature_1 5

feature_2 3

I am now trying to get this data into a dataframe, with the column names "Feature" and "Count". To have the expected output look like this:

Feature Count
feature_1 5
feature_2 3

I am using .to_frame() to push the series to a dataframe in order to add column names. Full code:

df = data.count()
df = df.to_frame()
df.columns = ['Feature', 'Count']

However receiving this error message - "ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements", as if though it is not recognising the actual column names (Feature) as a column with values.

How can i get it to recognise both Feature and Count columns to be able to add column names to them?

CodePudding user response:

Add Series.reset_index instead Series.to_frame for 2 columns DataFrame - first column from index, second from values of Series:

df = data.count().reset_index()
df.columns = ['Feature', 'Count']
print (df)
     Feature  Count
0  feature_1      5
1  feature_2      3

Another solution with name parameter and Series.rename_axis or with DataFrame.set_axis:

df = data.count().rename_axis('Feature').reset_index(name='Count')
#alternative
df = data.count().reset_index().set_axis(['Feature', 'Count'], axis=1)
print (df)
     Feature  Count
0  feature_1      5
1  feature_2      3

CodePudding user response:

This happens because your new dataframe has only one column (the column name is taken as series index, then translated into dataframe index with the func to_frame()). In order to assign a 2 elements list to df.columns you have to reset the index first:

df = data.count()
df = df.to_frame().reset_index()
df.columns = ['Feature', 'Count']
  • Related