Consider a dataframe with 2 columns for easiness. The first column is label
which has same values for some of the observations in dataset.
Sample dataset:
import pandas as pd
data = [('A', 28),
('B', 32),
('B', 32),
('C', 25),
('D', 25),
('D', 40),
('E', 32) ]
data_df = pd.DataFrame(data, columns = ['label', 'num'])
For column label
, I want to find rows with similar values. And convert its value into value_counter
like below:
label num
A 28
B_1 32
B_2 32
C 25
D_1 25
D_2 40
E 32
I tried to use pandas groupby but I don't know which transform
I have to use.
Thank you for your help.
CodePudding user response:
You can use:
s = data_df.groupby('label').cumcount() 1
data_df['label'] = np.where(data_df.duplicated(subset='label', keep=False),
data_df['label'] '_' s.astype(str), data_df['label'])
OUTPUT
label num
0 A 28
1 B_1 32
2 B_2 32
3 C 25
4 D_1 25
5 D_2 40
6 E 32
CodePudding user response:
You can create an empty dictionary
, which you can append with your label and it's count (keys
and values
respectively). Then depending on whether the label is new, or it exists, you can increment it's value or return it intact.
The last step would be to use this new list
as a new label column:
labels = data_df['label'].tolist()
new_labels = []
label_c = {}
# iterate through your labels list
for val in labels:
if val not in label_c: # if label not the new label list
label_c[val] = 0 # add it to dictionary
new_labels.append(val) # add it to the output as is
else: # if it's not new
label_c[val] = 1 # increment its count
new_labels.append(f"{val}_{label_c[val]}") # add it to the output along with its count
data_df['label'] = new_labels
prints back:
>>> print(data_df)
label num
0 A 28
1 B 32
2 B_1 32
3 C 25
4 D 25
5 D_1 40
6 E 32