I have a df with column 'Month'. The notation in that column is for example 'Apr 2021'. This is the case for all months, so Jan, Feb, Mar etc. And for a lot of years. I need to change, for example 'Apr 2021' to 04-2021 (first as a str and afterwards as datetime). The function I made works fine on the column as a series. BUTTTT..... as you can see underneath: the series is fine, but the column in the Dataframe shows None for all rows in column 'Month'. What is wrong here/what am I doing wrong? All code below. Thanks!
df_all.Month (first rows)
0 Feb 2021
1 Mar 2021
2 Mar 2021
3 Apr 2021
4 Apr 2021
5 May 2021
6 May 2021
7 May 2021
8 Jun 2021
9 Jun 2021
10 Jun 2021
Function to change values:
def test(x):
x.strip()
deel = x.split()
if deel[0] == 'Jan':
deel[0] = '01-'
x = deel[0] deel[1]
elif deel[0] == 'Feb':
deel[0] = '02-'
x = deel[0] deel[1]
elif deel[0] == 'Mar':
deel[0] = '03-'
x = deel[0] deel[1]
elif deel[0] == 'Apr':
deel[0] = '04-'
x = deel[0] deel[1]
elif deel[0] == 'May':
deel[0] = '05-'
x = deel[0] deel[1]
elif deel[0] == 'Jun':
deel[0] = '06-'
x = deel[0] deel[1]
elif deel[0] == 'Jul':
deel[0] = '07-'
x = deel[0] deel[1]
elif deel[0] == 'Aug':
deel[0] = '08-'
x = deel[0] deel[1]
elif deel[0] == 'Sep':
deel[0] = '09-'
x = deel[0] deel[1]
elif deel[0] == 'Oct':
deel[0] = '10-'
x = deel[0] deel[1]
elif deel[0] == 'Nov':
deel[0] = '1-'
x = deel[0] deel[1]
elif deel[0] == 'Dec':
deel[0] = '12-'
x = deel[0] deel[1]
else:
print('nope')
This gives the changed Series, so this works (but as said Not in the df as shown beneath series)
df_all['Month'] = df_all['Month'].map(test)
df_all
02-2021
03-2021
03-2021
04-2021
04-2021
05-2021
05-2021
05-2021
06-2021
06-2021
06-2021
07-2021
07-2021
07-2021
08-2021
08-2021
08-2021
09-2021
09-2021
09-2021
10-2021
10-2021
10-2021
1-2021
1-2021
1-2021
12-2021
12-2021
12-2021
01-2022
01-2022
01-2022
02-2022
02-2022
02-2022
(first rows of the df:
Month Access Type Users Accessing Apps
0 None Analyzer 7
1 None Analyzer 77
2 None Professional 3
3 None Analyzer 114
4 None Professional 3
5 None Analyzer 104
Adding a brand new column gives the same result: series is OK, but the new column only contains None values...
Month Access Type Users Accessing Apps Maand
0 Feb 2021 Analyzer 7 None
1 Mar 2021 Analyzer 77 None
2 Mar 2021 Professional 3 None
3 Apr 2021 Analyzer 114 None
4 Apr 2021 Professional 3 None
Maybe this helps you:
df_all['Maand'] = df_all['Month'].map(test)
print(df_all['Maand'])
type(df_all.Maand)
02-2021
03-2021
03-2021
04-2021
04-2021
05-2021
05-2021
05-2021
06-2021
06-2021
06-2021
07-2021
07-2021
07-2021
08-2021
08-2021
08-2021
09-2021
09-2021
09-2021
10-2021
10-2021
10-2021
1-2021
1-2021
1-2021
12-2021
12-2021
12-2021
01-2022
01-2022
01-2022
02-2022
02-2022
02-2022
0 None
1 None
2 None
3 None
4 None
5 None
6 None
7 None
8 None
9 None
10 None
11 None
12 None
13 None
14 None
15 None
16 None
17 None
18 None
19 None
20 None
21 None
22 None
23 None
24 None
25 None
26 None
27 None
28 None
29 None
30 None
31 None
32 None
33 None
34 None
Name: Maand, dtype: object
pandas.core.series.Series
CodePudding user response:
I believe you should have a different approach to deal with your problem. A good solution involves working only with datetime, this way you can easily navigate your date, using proper functions adding efficiency to your code.
The code below does the formating job, without the need of using a function with a series of conditional statements:
df_all = pd.DataFrame({'Month': ['Feb 2021','Mar 2021','Mar 2021','Apr 2021','Apr 2021','May 2021','May 2021','May 2021','Jun 2021','Jun 2021','Jun 2021']})
df_all
Month
0 Feb 2021
1 Mar 2021
2 Mar 2021
3 Apr 2021
4 Apr 2021
5 May 2021
6 May 2021
7 May 2021
8 Jun 2021
9 Jun 2021
10 Jun 2021
My suggestion: use this instead of your function.
df_all['Maand'] = pd.to_datetime(df_all['Month'],format='%b %Y').dt.to_period('M')
Output:
Month Maand
0 Feb 2021 2021-02
1 Mar 2021 2021-03
2 Mar 2021 2021-03
3 Apr 2021 2021-04
4 Apr 2021 2021-04
5 May 2021 2021-05
6 May 2021 2021-05
7 May 2021 2021-05
8 Jun 2021 2021-06
9 Jun 2021 2021-06
10 Jun 2021 2021-06
df_all.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Month 11 non-null object
1 Maand 11 non-null period[M]
dtypes: object(1), period[M](1)
memory usage: 304.0 bytes
CodePudding user response:
There is an easier way to do this by using pandas' pd.to_datetime() function.
df_all['Month'] = pd.to_datetime(df_all['Month'], format= "%b %Y")