I am working on a raw excel file to develop an organized format database (.xlsx) format. The demo input file is given as: input file
FromTo B# Bname Id Mend
1 to 2 123 bus1 1 F
1 to 3 234 bus2 1 F
5 to 6 321 bus3 2 F
9 to 10 322 bus5 2 F
1 to 2 326 bus6 1 F
1 to 2 457 bus7 1 F
5 to 6 656 bus8 1 F
9 to 10 780 bus9 2 F
1 to 3 875 bus10 2 F
1 to 3 564 bus11 2 F
The required output is in the following format: output format
Essentially, I want to automate the filter method on column 'FromTo' (based on cell value) of the input and put the information of other columns as it is, as depicted in the output format image.
For output, I am able to get the columns B to E as required in the correct order and format. For this, I used the following logic using pandas
import pandas as pd
df = pd.read_excel('st2_trial.xlsx')
#create an empty dataframe
df_1 = pd.DataFrame()
ai = ['1 to 2','1 to 3','5 to 6', '9 to 10'] #all entries from input Col 'FromTo'
for i in range(len(ai)):
filter_ai = (df['FromTo'] == (ai[i]))
df_ai = (df.loc[filter_ai])
df_1 = pd.concat([df_1,df_ai])
print(df_1)
Getting the following output from this code:
FromTo B# Bname Id Mend
1 to 2 123 bus1 1 F
1 to 2 326 bus6 1 F
1 to 2 457 bus7 1 F
1 to 3 234 bus2 1 F
1 to 3 875 bus10 2 F
1 to 3 564 bus11 2 F
1 to 3 893 bus12 1 F
5 to 6 321 bus3 2 F
5 to 6 656 bus8 1 F
5 to 6 212 bus13 2 F
9 to 10 322 bus5 2 F
9 to 10 780 bus9 2 F
However, clearly, the first column is not the way I want! I am looking to aviod redundunt entries of '1 to 2', '1 to 3', etc. in the first column.
I believe this can be achieved by proper loops in place for the first output column. Any help with the same will be highly appreciated!
PS: I have something in mind to work around this:
-create empty dataframe
-list of all unique entries of column 'FromTo'
-take first element of the list put in first column of output
-Then go over my logic to get other required information as it is in loop
This way, I think, it would avoid the redundant entries in first column of output.
CodePudding user response:
You could try something like this to get your expected output:
df_group = df_1.groupby('FromTo')
print(df_group)
CodePudding user response:
The above question seems similar if not exactly to How to print a groupby object . However I will post my answer here if it helps.
import pandas as pd
df = pd.read_excel('st2_trial.xlsx')
df_group = df.groupby('FromTo').apply(lambda a: a.drop('FromTo', axis = 1)[:].reset_index(drop = True))
print(df_group)
OUTPUT:
B# Bname Id Mend
FromTo
1 to 2 0 123 bus1 1 F
1 326 bus6 1 F
2 457 bus7 1 F
1 to 3 0 234 bus2 1 F
1 875 bus10 2 F
2 564 bus11 2 F
5 to 6 0 321 bus3 2 F
1 656 bus8 1 F
9 to 10 0 322 bus5 2 F
1 780 bus9 2 F