Home > Blockchain >  N consecutive rows to form a dataframe columns in python pandas
N consecutive rows to form a dataframe columns in python pandas

Time:05-24

I have the following dataframe with consecutive rows that needs to be transformed in columns

Input dataframe looks like the given below DF..

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load  10",""]
}

df5 = pd.DataFrame.from_dict(data5)

input dataframe output:

    df5
Out[52]: 
               R_T_D_L_X
0         Route for v 0:
1          0 0 C Load(0)
2                 31193m
3   Load of the route: 3
4                       
5         Route for v 1:
6        0 0 C 0 Load(0)
7                     0m
8   Load of the route: 0
9                       
10  Route for vehicle 2:
11        0 0 0 Load(10)
12                13406m
13              Load  10
14                  

Output and expected dataframe as below,

data6 = {
'R':["Route for v 0:","Route for v 1:","Route for v 2:"],
'T':["0 0 C Load(0)","0 0 C 0 Load(0)","0 0 0 Load(10)"],
"D":["31193m"," 0m","13406m",],
'L':["Load : 3","Load: 0","Load  10"],
'X':["","",""]
}

df6 = pd.DataFrame.from_dict(data6)

Results as expected:

    df6
Out[54]: 
                R                T       D         L X
0  Route for v 0:    0 0 C Load(0)  31193m  Load : 3  
1  Route for v 1:  0 0 C 0 Load(0)      0m   Load: 0  
2  Route for v 2:   0 0 0 Load(10)  13406m  Load  10  

Thanking you..

CodePudding user response:

Use reshape with DataFrame constructor for new DataFrame:

df = pd.DataFrame(df5['R_T_D_L_X'].to_numpy().reshape(-1, 5), columns=['R','T','D','L','X'])
print (df)
                      R                T       D                     L X
0        Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3  
1        Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0  
2  Route for vehicle 2:   0 0 0 Load(10)  13406m              Load  10  

EDIT: If some values missing in the end of Series use:

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m"]
}

df5 = pd.DataFrame.from_dict(data5)

arr = np.arange(len(df5))

df = df5.assign(a1=arr // 5, a2 = arr % 5).pivot('a1','a2','R_T_D_L_X')
print (df)
a2                     0                1       2                     3    4
a1                                                                          
0         Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3     
1         Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0     
2   Route for vehicle 2:   0 0 0 Load(10)  13406m                   NaN  NaN

CodePudding user response:

Try this with some fancy dataframe reshaping.

import pandas as pd

data5 = {
'R_T_D_L_X':["Route for v 0:",
 "0 0 C Load(0)",
"31193m",
"Load of the route: 3",
"",
"Route for v 1:",
 "0 0 C 0 Load(0)",
" 0m",
"Load of the route: 0",
"",
"Route for vehicle 2:",
"0 0 0 Load(10)",
"13406m",
"Load  10",""]
}

df5 = pd.DataFrame.from_dict(data5)

grp = df5['R_T_D_L_X'].str.startswith('Route').cumsum()

df5.set_index([grp, df5.groupby(grp).cumcount()])\
   .unstack()\
   .droplevel(0, axis=1)\
   .set_axis(df5.columns[0].split('_'), axis=1)\
   .rename_axis(None)

Output:

                      R                T       D                     L X
1        Route for v 0:    0 0 C Load(0)  31193m  Load of the route: 3  
2        Route for v 1:  0 0 C 0 Load(0)      0m  Load of the route: 0  
3  Route for vehicle 2:   0 0 0 Load(10)  13406m              Load  10  

CodePudding user response:

df_app = pd.DataFrame()
df_temp = pd.DataFrame(columns =['R','T','D','L','X'])
iloc_num = 0

for idx, row in df5.iterrows():

    
    print(idx % 5)
    

    
    if idx % 5 == 0:
        df_temp.loc[iloc_num,'R']=row.values
    
    if idx % 5 == 1:
        df_temp.loc[iloc_num,'T']=row.values
    if idx % 5 == 2:
        df_temp.loc[iloc_num,'D']=row.values
    if idx % 5 == 3:
        df_temp.loc[iloc_num,'L']=row.values
    if idx % 5 == 4:
        df_temp.loc[iloc_num,'X']=row.values
        iloc_num = iloc_num   1
        df_app.append(df_temp, ignore_index = True)

I was expecting the answer in df_app, But I have managed to get the answer in df_temp dataframe as below

df_temp
Out[167]: 
                        R                  T         D                       L   X
0        [Route for v 0:]    [0 0 C Load(0)]  [31193m]  [Load of the route: 3]  []
1        [Route for v 1:]  [0 0 C 0 Load(0)]     [ 0m]  [Load of the route: 0]  []
2  [Route for vehicle 2:]   [0 0 0 Load(10)]  [13406m]              [Load  10]  []
  • Related