I have the following dataframe with consecutive rows that needs to be transformed in columns
Input dataframe looks like the given below DF..
data5 = {
'R_T_D_L_X':["Route for v 0:",
"0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
"0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load 10",""]
}
df5 = pd.DataFrame.from_dict(data5)
input dataframe output:
df5
Out[52]:
R_T_D_L_X
0 Route for v 0:
1 0 0 C Load(0)
2 31193m
3 Load of the route: 3
4
5 Route for v 1:
6 0 0 C 0 Load(0)
7 0m
8 Load of the route: 0
9
10 Route for vehicle 2:
11 0 0 0 Load(10)
12 13406m
13 Load 10
14
Output and expected dataframe as below,
data6 = {
'R':["Route for v 0:","Route for v 1:","Route for v 2:"],
'T':["0 0 C Load(0)","0 0 C 0 Load(0)","0 0 0 Load(10)"],
"D":["31193m"," 0m","13406m",],
'L':["Load : 3","Load: 0","Load 10"],
'X':["","",""]
}
df6 = pd.DataFrame.from_dict(data6)
Results as expected:
df6
Out[54]:
R T D L X
0 Route for v 0: 0 0 C Load(0) 31193m Load : 3
1 Route for v 1: 0 0 C 0 Load(0) 0m Load: 0
2 Route for v 2: 0 0 0 Load(10) 13406m Load 10
Thanking you..
CodePudding user response:
Use reshape
with DataFrame
constructor for new DataFrame:
df = pd.DataFrame(df5['R_T_D_L_X'].to_numpy().reshape(-1, 5), columns=['R','T','D','L','X'])
print (df)
R T D L X
0 Route for v 0: 0 0 C Load(0) 31193m Load of the route: 3
1 Route for v 1: 0 0 C 0 Load(0) 0m Load of the route: 0
2 Route for vehicle 2: 0 0 0 Load(10) 13406m Load 10
EDIT: If some values missing in the end of Series use:
data5 = {
'R_T_D_L_X':["Route for v 0:",
"0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
"0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m"]
}
df5 = pd.DataFrame.from_dict(data5)
arr = np.arange(len(df5))
df = df5.assign(a1=arr // 5, a2 = arr % 5).pivot('a1','a2','R_T_D_L_X')
print (df)
a2 0 1 2 3 4
a1
0 Route for v 0: 0 0 C Load(0) 31193m Load of the route: 3
1 Route for v 1: 0 0 C 0 Load(0) 0m Load of the route: 0
2 Route for vehicle 2: 0 0 0 Load(10) 13406m NaN NaN
CodePudding user response:
Try this with some fancy dataframe reshaping.
import pandas as pd
data5 = {
'R_T_D_L_X':["Route for v 0:",
"0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
"0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load 10",""]
}
df5 = pd.DataFrame.from_dict(data5)
grp = df5['R_T_D_L_X'].str.startswith('Route').cumsum()
df5.set_index([grp, df5.groupby(grp).cumcount()])\
.unstack()\
.droplevel(0, axis=1)\
.set_axis(df5.columns[0].split('_'), axis=1)\
.rename_axis(None)
Output:
R T D L X
1 Route for v 0: 0 0 C Load(0) 31193m Load of the route: 3
2 Route for v 1: 0 0 C 0 Load(0) 0m Load of the route: 0
3 Route for vehicle 2: 0 0 0 Load(10) 13406m Load 10
CodePudding user response:
df_app = pd.DataFrame()
df_temp = pd.DataFrame(columns =['R','T','D','L','X'])
iloc_num = 0
for idx, row in df5.iterrows():
print(idx % 5)
if idx % 5 == 0:
df_temp.loc[iloc_num,'R']=row.values
if idx % 5 == 1:
df_temp.loc[iloc_num,'T']=row.values
if idx % 5 == 2:
df_temp.loc[iloc_num,'D']=row.values
if idx % 5 == 3:
df_temp.loc[iloc_num,'L']=row.values
if idx % 5 == 4:
df_temp.loc[iloc_num,'X']=row.values
iloc_num = iloc_num 1
df_app.append(df_temp, ignore_index = True)
I was expecting the answer in df_app, But I have managed to get the answer in df_temp dataframe as below
df_temp
Out[167]:
R T D L X
0 [Route for v 0:] [0 0 C Load(0)] [31193m] [Load of the route: 3] []
1 [Route for v 1:] [0 0 C 0 Load(0)] [ 0m] [Load of the route: 0] []
2 [Route for vehicle 2:] [0 0 0 Load(10)] [13406m] [Load 10] []