Home > database >  Python/Pandas/Excel Creating a 2D array from 3 columns
Python/Pandas/Excel Creating a 2D array from 3 columns

Time:11-17

I have two questions actually. I have a dataframe like the one below. I need to split it into years/months, same as a fixed width delimiter in Excel. Pandas str.split() can't do this based on the documentation, it needs a delimiting character.

Initial df:

    Year/Period PurchDoc
0   FY19P01 162
1   FY19P02 148
2   FY19P03 133
3   FY19P04 157
4   FY19P05 152
5   FY19P06 176
6   FY19P07 123
7   FY19P08 143
8   FY19P09 161
9   FY19P10 177
10  FY19P11 152
11  FY19P12 175
12  FY20P01 203
13  FY20P02 157
14  FY20P03 206
15  FY20P04 247
16  FY20P05 182
17  FY20P06 141
18  FY20P07 205
19  FY20P08 194

Expected result:

Year    Period  PurchDoc
0   FY19    P01 162
1   FY19    P02 148
2   FY19    P03 133
3   FY19    P04 157
4   FY19    P05 152
5   FY19    P06 176
6   FY19    P07 123
7   FY19    P08 143
8   FY19    P09 161
9   FY19    P10 177
10  FY19    P11 152
11  FY19    P12 175
12  FY20    P01 203
13  FY20    P02 157
14  FY20    P03 206
15  FY20    P04 247
16  FY20    P05 182
17  FY20    P06 141
18  FY20    P07 205
19  FY20    P08 194

Second, I need to transpose the period and PurchDoc columns so it looks like this (well, as ints and no NaNs, but I can fix that):

    Unnamed: 0  P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 P12
0   FY19    162 148 133 157 152.0   176.0   123.0   143.0   161.0   177.0   152.0   175.0
1   FY20    203 157 206 247 182.0   141.0   205.0   194.0   113.0   44.0    26.0    17.0
2   FY21    41  53  42  40  52.0    54.0    57.0    46.0    90.0    103.0   63.0    86.0
3   FY22    114 96  87  92  NaN NaN NaN NaN NaN NaN NaN NaN

Couldn't find anything remotely useful googling unfortunately, so I don't have any failed code to show.

CodePudding user response:

df[["Year", "Period"]] = df.apply(lambda x: (x["Year/Period"][:4], x["Year/Period"][4:]), result_type="expand", axis=1)

Then:

pd.pivot_table(df, columns="Period", index="Year", values="PurchDoc", aggfunc="sum")

CodePudding user response:

df['Year'] = df['Year/Period'].str.slice(stop=4)
df['Period'] = df['Year/Period'].str.slice(start=4)
df.drop('Year/Period', axis=1, inplace=True)
df = df.pivot(values = 'PurchDoc', index = 'Year', columns = 'Period')
print(df)

output:

Period    P01    P02    P03    P04    P05    P06    P07    P08    P09    P10    P11    P12
Year
FY19    162.0  148.0  133.0  157.0  152.0  176.0  123.0  143.0  161.0  177.0  152.0  175.0
FY20    203.0  157.0  206.0  247.0  182.0  141.0  205.0  194.0    NaN    NaN    NaN    NaN
  • Related