develop excel database based on column filters using pandas python-CodePudding

I am working on a raw excel file to develop an organized format database (.xlsx) format. The demo input file is given as: input file

FromTo  B#      Bname   Id      Mend
1 to 2  123     bus1    1       F
1 to 3  234     bus2    1       F
5 to 6  321     bus3    2       F
9 to 10 322     bus5    2       F
1 to 2  326     bus6    1       F
1 to 2  457     bus7    1       F
5 to 6  656     bus8    1       F
9 to 10 780     bus9    2       F
1 to 3  875     bus10   2       F
1 to 3  564     bus11   2       F

The required output is in the following format: output format

Essentially, I want to automate the filter method on column 'FromTo' (based on cell value) of the input and put the information of other columns as it is, as depicted in the output format image.

For output, I am able to get the columns B to E as required in the correct order and format. For this, I used the following logic using pandas

import pandas as pd
df = pd.read_excel('st2_trial.xlsx')
#create an empty dataframe 
df_1 = pd.DataFrame()

ai = ['1 to 2','1 to 3','5 to 6', '9 to 10'] #all entries from input Col 'FromTo'
    
for i in range(len(ai)):
    filter_ai = (df['FromTo'] == (ai[i]))
    df_ai = (df.loc[filter_ai])
    df_1 = pd.concat([df_1,df_ai])
    print(df_1)

Getting the following output from this code:

FromTo   B#  Bname  Id Mend
1 to 2  123   bus1   1    F
1 to 2  326   bus6   1    F
1 to 2  457   bus7   1    F
1 to 3  234   bus2   1    F
1 to 3  875  bus10   2    F
1 to 3  564  bus11   2    F
1 to 3  893  bus12   1    F
5 to 6  321   bus3   2    F
5 to 6  656   bus8   1    F
5 to 6  212  bus13   2    F
9 to 10  322   bus5   2    F
9 to 10  780   bus9   2    F

However, clearly, the first column is not the way I want! I am looking to aviod redundunt entries of '1 to 2', '1 to 3', etc. in the first column.

I believe this can be achieved by proper loops in place for the first output column. Any help with the same will be highly appreciated!

PS: I have something in mind to work around this:

-create empty dataframe
-list of all unique entries of column 'FromTo'
-take first element of the list put in first column of output
-Then go over my logic to get other required information as it is in loop

This way, I think, it would avoid the redundant entries in first column of output.

CodePudding user response：

You could try something like this to get your expected output:

df_group = df_1.groupby('FromTo')
print(df_group)

CodePudding user response：

The above question seems similar if not exactly to How to print a groupby object . However I will post my answer here if it helps.


import pandas as pd
df = pd.read_excel('st2_trial.xlsx')

df_group = df.groupby('FromTo').apply(lambda a: a.drop('FromTo', axis = 1)[:].reset_index(drop = True))

print(df_group)

OUTPUT:
            B#  Bname  Id Mend
FromTo                        
1 to 2  0  123   bus1   1    F
        1  326   bus6   1    F
        2  457   bus7   1    F
1 to 3  0  234   bus2   1    F
        1  875  bus10   2    F
        2  564  bus11   2    F
5 to 6  0  321   bus3   2    F
        1  656   bus8   1    F
9 to 10 0  322   bus5   2    F
        1  780   bus9   2    F