I have two questions actually. I have a dataframe like the one below. I need to split it into years/months, same as a fixed width delimiter in Excel. Pandas str.split() can't do this based on the documentation, it needs a delimiting character.
Initial df:
Year/Period PurchDoc
0 FY19P01 162
1 FY19P02 148
2 FY19P03 133
3 FY19P04 157
4 FY19P05 152
5 FY19P06 176
6 FY19P07 123
7 FY19P08 143
8 FY19P09 161
9 FY19P10 177
10 FY19P11 152
11 FY19P12 175
12 FY20P01 203
13 FY20P02 157
14 FY20P03 206
15 FY20P04 247
16 FY20P05 182
17 FY20P06 141
18 FY20P07 205
19 FY20P08 194
Expected result:
Year Period PurchDoc
0 FY19 P01 162
1 FY19 P02 148
2 FY19 P03 133
3 FY19 P04 157
4 FY19 P05 152
5 FY19 P06 176
6 FY19 P07 123
7 FY19 P08 143
8 FY19 P09 161
9 FY19 P10 177
10 FY19 P11 152
11 FY19 P12 175
12 FY20 P01 203
13 FY20 P02 157
14 FY20 P03 206
15 FY20 P04 247
16 FY20 P05 182
17 FY20 P06 141
18 FY20 P07 205
19 FY20 P08 194
Second, I need to transpose the period and PurchDoc columns so it looks like this (well, as ints and no NaNs, but I can fix that):
Unnamed: 0 P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 P12
0 FY19 162 148 133 157 152.0 176.0 123.0 143.0 161.0 177.0 152.0 175.0
1 FY20 203 157 206 247 182.0 141.0 205.0 194.0 113.0 44.0 26.0 17.0
2 FY21 41 53 42 40 52.0 54.0 57.0 46.0 90.0 103.0 63.0 86.0
3 FY22 114 96 87 92 NaN NaN NaN NaN NaN NaN NaN NaN
Couldn't find anything remotely useful googling unfortunately, so I don't have any failed code to show.
CodePudding user response:
df[["Year", "Period"]] = df.apply(lambda x: (x["Year/Period"][:4], x["Year/Period"][4:]), result_type="expand", axis=1)
Then:
pd.pivot_table(df, columns="Period", index="Year", values="PurchDoc", aggfunc="sum")
CodePudding user response:
df['Year'] = df['Year/Period'].str.slice(stop=4)
df['Period'] = df['Year/Period'].str.slice(start=4)
df.drop('Year/Period', axis=1, inplace=True)
df = df.pivot(values = 'PurchDoc', index = 'Year', columns = 'Period')
print(df)
output:
Period P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 P12
Year
FY19 162.0 148.0 133.0 157.0 152.0 176.0 123.0 143.0 161.0 177.0 152.0 175.0
FY20 203.0 157.0 206.0 247.0 182.0 141.0 205.0 194.0 NaN NaN NaN NaN