I have a Pandas dataframe that i read from a excel file, this file is in spanish and has all the months in the date column as follows:
"11/ene/2021, 22/feb/, 04/mar/2021, 15/abr/2021, 13/may/2021, 16/jun/2021, 14/jul/2021, ...."
I.e., every month is a string of 3 characters, followed by the year and with the day at first position of the format, d/m/y
.
As it is in Spanish I can't parse with datetime, so i think i could search with regex the month and replace with a value stored in a dictionary equivalent to month number.
The regex i think could works is:
r'\d \/[a-z]{3}\/\d '
Is there any way to do that in Pandas?
CodePudding user response:
The answer by @Wiktor is exactly right. However, I think the following implementation is much easier to read:
months = ['ene','feb','mar','abr','may','jun','jul','ago','sep','oct','nov','dic']
d = dict(zip(months, np.arange(1, 13)))
Then you can simply use pandas.to_datetime
:
pd.to_datetime(df['dates'].replace(d, regex=True))
CodePudding user response:
You can use
import pandas as pd
months = {'ene': '01', 'feb': '02', 'mar':'03', 'abr':'04', 'may': '05', 'jun': '06','jul': '07','ago': '08','sep': '09','oct': '10','nov': '11','dic': '12'}
df = pd.DataFrame({'dates': ['11/ene/2021', '22/feb/2021', '04/mar/2021', '15/abr/2021', '13/may/2021', '16/jun/2021', '14/jul/2021']})
# => df
# dates
# 0 11/ene/2021
# 1 22/feb/2021
# 2 04/mar/2021
# 3 15/abr/2021
# 4 13/may/2021
# 5 16/jun/2021
# 6 14/jul/2021
pd.to_datetime(df['dates'].str.replace(rf'\b(?:{"|".join([x for x in months])})\b', lambda x: months[x.group()], regex=True))
# => 0 2021-11-01
# 1 2021-02-22
# 2 2021-04-03
# 3 2021-04-15
# 4 2021-05-13
# 5 2021-06-16
# 6 2021-07-14
# Name: dates, dtype: datetime64[ns]
Here, rf'\b(?:{"|".join([x for x in months])})\b'
creates a regex like \b(?:ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic)\b
, that matches 3-letter Spanish month abbreviations. Once matched in a string, the matched text is used as a key to months
dictionary to obtain the appropriate value. The result is passed to the pd.to_datetime
function.
CodePudding user response:
Instead of implementing this behaviour ourselves, we can instead setlocale
LC_TIME
to Spanish (es_ES
) then to_datetime
will work as expected with a format string as strftime() and strptime() Behavior is relative to the locale:
import locale
import pandas as pd
# Use Spanish Locale
locale.setlocale(locale.LC_TIME, 'es_ES')
df = pd.DataFrame({
'dates': ['11/ene/2021', '22/feb/2021', '04/mar/2021', '15/abr/2021',
'13/may/2021', '16/jun/2021', '14/jul/2021', '03/ago/2021',
'07/sep/2021', '27/oct/2021', '17/nov/2021', '29/dic/2021']
})
# Convert to DateTime using pattern
# 2-digit day %d, locale month abbr %b, 4-digit year %Y
df['converted_dates'] = pd.to_datetime(df['dates'], format='%d/%b/%Y')
print(df)
df
:
dates converted_dates
0 11/ene/2021 2021-01-11
1 22/feb/2021 2021-02-22
2 04/mar/2021 2021-03-04
3 15/abr/2021 2021-04-15
4 13/may/2021 2021-05-13
5 16/jun/2021 2021-06-16
6 14/jul/2021 2021-07-14
7 03/ago/2021 2021-08-03
8 07/sep/2021 2021-09-07
9 27/oct/2021 2021-10-27
10 17/nov/2021 2021-11-17
11 29/dic/2021 2021-12-29
CodePudding user response:
You could use something like this:
month = {'ene':'1','feb':'2','mar':'3','abr':'4'}
df.columns = df.columns.to_series().replace(month, regex=True)