I have a column that contains repetitive houses idshouse_id
.
I want to group the similar houses ids as family_label
and give them an ordinal label.
My original data looks like this
df_original = pd.DataFrame({'house_id':['112', '119', '913', '514', '112', '119', '119']})
house_id
112
119
913
514
112
119
119
My target result looks like the below dataframe
df_result = pd.DataFrame({'house_id':['112', '119', '913', '514', '112', '119', '119'], 'family_label':['family1', 'family2', 'family3', 'family4', 'family1', 'family2', 'family2']})
house_id family_label
112 family1
119 family2
913 family3
514 family4
112 family1
119 family2
119 family2
So far this is what I have achived.
I used this code
df_original['label'] = df_original.groupby(df_original.house_id).grouper.group_info[0] 1
it generates the below output
house_id label
112 1
119 2
913 3
514 4
112 1
119 2
119 2
I want to know if my approach is correct and I want to add the word 'family' before each number.
CodePudding user response:
You can use a list comprehension and precede family
string. Such as:
df_original['label'] = ["family" str(x) for x in (df_original.groupby(df_original.house_id).grouper.group_info[0] 1)]
Outputting:
house_id label
0 112 family1
1 119 family2
2 913 family4
3 514 family3
4 112 family1
5 119 family2
6 119 family2
CodePudding user response:
Use GroupBy.ngroup
:
df_original['label'] = "family" (df_original.groupby('house_id').ngroup() 1).astype(str)
print (df_original)
house_id label
0 112 family1
1 119 family2
2 913 family4
3 514 family3
4 112 family1
5 119 family2
6 119 family2