Home > Blockchain >  replace part of an int or string in a pandas dataframe column upon condition
replace part of an int or string in a pandas dataframe column upon condition

Time:04-14

I have a pandas dataframe with a column representing dates but saved in int format. For several dates I have a 13th and a 14th month. I would like to replace these 13th and 14th months by the 12th month. And then, eventually transform it into date_time format.

Original_date
20190101
20191301
20191401

New_date
20190101
20191201
20191201

I tried by replacing the format into string then replace only based on the index of the months in the string [4:6], but it didn't work out:

df.original_date.astype(str)
for string in df['original_date']:
    if string[4:6]=="13" or string[4:6]=="14":
        string.replace(string, string[:4]  "12"   string[6:])
print(df['original_date'])

CodePudding user response:

You can use .str.replace with regex

df['New_date'] = df['Original_date'].astype(str).str.replace('(\d{4})(13|14)(\d{2})', r'\g<1>12\3', regex=True)
print(df)

   Original_date  New_date
0       20190101  20190101
1       20191301  20191201
2       20191401  20191201

CodePudding user response:

Why not just write a regular expression?

s = pd.Series('''20190101
20191301
20191401'''.split('\n')).astype(str)
s.str.replace('(?<=\d{4})(13|14)(?=01)', '12', regex=True)

Yielding:

0    20190101
1    20191201
2    20191201
dtype: object

(Nb you will need to reassign the output back to a column to persist it in memory.)

CodePudding user response:

You can write the replace and logic in a seperate function, which also gives you the option to adapt it easily if you also need to change the year or month. apply lets you use that function on each row of the DataFrame.

import pandas as pd

def split_and_replace(x):
    year = x[0:4]
    month = x[4:6]
    day = x[6:8]
    if month in ('13', '14'):
        month = '12'
    else:
        pass
    
    return year   month   day
    

df = pd.DataFrame(
    data={
        'Original_date': ['20190101', '20191301', '20191401']    
    }
)

res = df.Original_date.apply(lambda x: split_and_replace(x))

print(res)
  • Related