Home > Mobile >  Modify Spanish month in date column
Modify Spanish month in date column

Time:10-07

I have a Pandas dataframe that i read from a excel file, this file is in spanish and has all the months in the date column as follows:

"11/ene/2021, 22/feb/, 04/mar/2021, 15/abr/2021, 13/may/2021, 16/jun/2021, 14/jul/2021, ...."

I.e., every month is a string of 3 characters, followed by the year and with the day at first position of the format, d/m/y.

As it is in Spanish I can't parse with datetime, so i think i could search with regex the month and replace with a value stored in a dictionary equivalent to month number.

The regex i think could works is:

r'\d \/[a-z]{3}\/\d '

Is there any way to do that in Pandas?

CodePudding user response:

The answer by @Wiktor is exactly right. However, I think the following implementation is much easier to read:

months = ['ene','feb','mar','abr','may','jun','jul','ago','sep','oct','nov','dic']
d = dict(zip(months, np.arange(1, 13)))

Then you can simply use pandas.to_datetime:

pd.to_datetime(df['dates'].replace(d, regex=True))

CodePudding user response:

You can use

import pandas as pd
months = {'ene': '01', 'feb': '02', 'mar':'03', 'abr':'04', 'may': '05', 'jun': '06','jul': '07','ago': '08','sep': '09','oct': '10','nov': '11','dic': '12'}
df = pd.DataFrame({'dates': ['11/ene/2021', '22/feb/2021', '04/mar/2021', '15/abr/2021', '13/may/2021', '16/jun/2021', '14/jul/2021']})
# => df
#           dates
#  0  11/ene/2021
#  1  22/feb/2021
#  2  04/mar/2021
#  3  15/abr/2021
#  4  13/may/2021
#  5  16/jun/2021
#  6  14/jul/2021
pd.to_datetime(df['dates'].str.replace(rf'\b(?:{"|".join([x for x in months])})\b', lambda x: months[x.group()], regex=True))
# => 0   2021-11-01
#    1   2021-02-22
#    2   2021-04-03
#    3   2021-04-15
#    4   2021-05-13
#    5   2021-06-16
#    6   2021-07-14
#    Name: dates, dtype: datetime64[ns]

Here, rf'\b(?:{"|".join([x for x in months])})\b' creates a regex like \b(?:ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic)\b, that matches 3-letter Spanish month abbreviations. Once matched in a string, the matched text is used as a key to months dictionary to obtain the appropriate value. The result is passed to the pd.to_datetime function.

CodePudding user response:

Instead of implementing this behaviour ourselves, we can instead setlocale LC_TIME to Spanish (es_ES) then to_datetime will work as expected with a format string as strftime() and strptime() Behavior is relative to the locale:

import locale

import pandas as pd

# Use Spanish Locale
locale.setlocale(locale.LC_TIME, 'es_ES')

df = pd.DataFrame({
    'dates': ['11/ene/2021', '22/feb/2021', '04/mar/2021', '15/abr/2021',
              '13/may/2021', '16/jun/2021', '14/jul/2021', '03/ago/2021',
              '07/sep/2021', '27/oct/2021', '17/nov/2021', '29/dic/2021']
})

# Convert to DateTime using pattern
# 2-digit day %d, locale month abbr %b, 4-digit year %Y
df['converted_dates'] = pd.to_datetime(df['dates'], format='%d/%b/%Y')

print(df)

df:

          dates converted_dates
0   11/ene/2021      2021-01-11
1   22/feb/2021      2021-02-22
2   04/mar/2021      2021-03-04
3   15/abr/2021      2021-04-15
4   13/may/2021      2021-05-13
5   16/jun/2021      2021-06-16
6   14/jul/2021      2021-07-14
7   03/ago/2021      2021-08-03
8   07/sep/2021      2021-09-07
9   27/oct/2021      2021-10-27
10  17/nov/2021      2021-11-17
11  29/dic/2021      2021-12-29

CodePudding user response:

You could use something like this:

month = {'ene':'1','feb':'2','mar':'3','abr':'4'}
df.columns = df.columns.to_series().replace(month, regex=True)
  • Related