I have a pandas data frame that looks like this.
Column1 Column2 Column3
0 DS 4.5 Hard
1 ML 2.5 Medium
2 CS 4 Hard
I want to check if any column is having a duplicate value if yes then need to store the unique value of that column in another df and store its index in the original position.
Like in this case we will have output as two df like below:
df1:
Column1 Column2 Column3
0 DS 4.5 0
1 ML 2.5 1
2 CS 4 0
d2:
Column1
0 Hard
1 Medium
CodePudding user response:
So, my approach would be to firstly create the dataframe containing only the unique values of Column3
:
df1 = pd.DataFrame(df1['Column3'].unique())
df1.columns = ['Column3']
Looks like this:
Column3
0 Hard
1 Medium
Then we can replace the Column3
values with the indices by using pandas replace()
method:
df2 = df.replace(to_replace=df1.values, value=df1.index.values)
Output:
Column1 Column2 Column3
0 DS 4.5 0
1 ML 2.5 1
2 CS 4.0 0