I combined two Csv files i now have an Data-Frame age column that contains data in the form:
51-55
41-45
41 45
46-50
36-40
46 50
26-30
21 25
36 40
31 35
26 30
21-25
56 or older
31-35
56-60
61 or older
20 or younger
i will like to make them be in the this range 15-25,26-35,36-45........
CodePudding user response:
Unifying the column entries across the two Pandas dataframes before merging them into one is the least inconvenient option. Moreover, it guarantees a clean merge.
# minimum reproducible example
df_1 = pd.DataFrame({'age' : ['21-25', '31-35', '56 or older'], 'data 1': [1,2,4]})
df_2 = pd.DataFrame({'age' : ['21 25', '31 35', '56-60'], 'data 2': [3,5,2]})
# delete '-'
df_1['age'] = df_1['age'].str.replace('-', ' ')
df_2['age'] = df_2['age'].str.replace('-', ' ')
# further manual replacement
replDict = {'56 or older' : '56 60'}
df_1.replace(replDict, inplace=True)
df_2.replace(replDict, inplace=True)
# merge
pd.merge(left=df_1, right=df_2, on='age')
This, for example, yields