Home > Net >  My pandas dataframe age column contains whitespace and string
My pandas dataframe age column contains whitespace and string

Time:07-07

I combined two Csv files i now have an Data-Frame age column that contains data in the form: 51-55
41-45
41 45
46-50
36-40
46 50
26-30
21 25
36 40
31 35
26 30
21-25
56 or older
31-35
56-60
61 or older
20 or younger

i will like to make them be in the this range 15-25,26-35,36-45........

CodePudding user response:

Unifying the column entries across the two Pandas dataframes before merging them into one is the least inconvenient option. Moreover, it guarantees a clean merge.

# minimum reproducible example
df_1 = pd.DataFrame({'age' : ['21-25', '31-35', '56 or older'], 'data 1': [1,2,4]})
df_2 = pd.DataFrame({'age' : ['21 25', '31 35', '56-60'], 'data 2': [3,5,2]})

# delete '-'
df_1['age'] = df_1['age'].str.replace('-', ' ')
df_2['age'] = df_2['age'].str.replace('-', ' ')

# further manual replacement
replDict = {'56 or older' : '56 60'}
df_1.replace(replDict, inplace=True)
df_2.replace(replDict, inplace=True)

# merge
pd.merge(left=df_1, right=df_2, on='age')

This, for example, yields

enter image description here

  • Related