Home > front end >  Value_counts() for each of the columns in one dataframe
Value_counts() for each of the columns in one dataframe

Time:06-29

Let's suppose I have a dataset like this:

ID Department Level
1 HR senior
2 IT junior
3 IT middle
4 IT middle
5 IT senior
6 HR middle
7 Design middle
8 Design middle

How do i transform this dataset using value_counts() towards every column, so that end result looks like this:

Department Department_Count Level Level_Count
HR 2 middle 1
senior 1
IT 4 junior 1
middle 2
senior 1
Design 2 middle 2

CodePudding user response:

You can do this by creating 2 groupby dataframes, one for each of the counts, and merging them together.

Department counts:

dept = df.groupby('Department', as_index=False).count()[['Department', 'ID']]
dept = dept.rename(columns = {'ID':'Department_Count'})

  Department  ID
0     Design   2
1         HR   2
2         IT   4

Level counts:

level = df.groupby(['Department', 'Level'], as_index=False).count()
level = level.rename(columns = {'ID':'Level_Count'})

  Department   Level  Level_Count
0     Design  middle            2
1         HR  middle            1
2         HR  senior            1
3         IT  junior            1
4         IT  middle            2
5         IT  senior            1

Then merge the two together on Department

df_out = dept.merge(level, on='Department')

  Department  ID   Level  Level_Count
0     Design   2  middle            2
1         HR   2  middle            1
2         HR   2  senior            1
3         IT   4  junior            1
4         IT   4  middle            2
5         IT   4  senior            1

To get the Nans in the Department and ID columns as requested, you could use .loc to find the duplicated rows in those columns, and replace with Nan (will need to import numpy as np):

df_out.loc[df_out[['Department', 'ID']].duplicated(), ['Department', 'ID']] = np.nan

  Department   ID   Level  Level_Count
0     Design  2.0  middle            2
1         HR  2.0  middle            1
2        NaN  NaN  senior            1
3         IT  4.0  junior            1
4        NaN  NaN  middle            2
5        NaN  NaN  senior            1
  • Related