i have multiple excel files that contains similar information that i want to bring in as list into python with pandas. Every file contains a table that has the info that i want but this table in some files starts at row 5 or in other in row 10 or 23 and go on (because before this table there are some titles that change from one file to others) so this cannot be a constant BUT the headers of the table are the same, can i tell pandas "take all data under the row with "specific name"" ? or have i to tell every time the right index?
Thanks! Have a good work!
Edit: To make it more clear this is how pandas read my dataframe (based on excel file)
So as you can see in row 2 (in this example) there is the raw with "Draw, Back#,Horse,Rider...) and this is my "keyrow" so i want that my df starts under this row so i can use all below datas to make my folders but as i said this row is the same with all excels but in every excels is in different row.
CodePudding user response:
Since the multiple Excel files have the same column headers, you could create a filter to "take all data under the row with "specific name"".
This can be done using df['col_name']=='specific name'
which will return an array of True
/False
values.
filter = df['col_name']=='specific name'
After that, apply the filter (array of True
/False
values) to the dataframe, which will keep those values that are True
only
df.loc[filter]
For example
df = pd.DataFrame({'col_name': ['nothing', 'specific name', 'specific name', 'blah', 'blah', 'blah'],
'col_2': ['blah blah', 'blah', 'blah blah', 'blah', 'blah', 'blah'] })
col_name col_2
0 nothing blah blah
1 specific name blah
2 specific name blah blah
3 blah blah
4 blah blah
5 blah blah
filter = df['col_name']=='specific name'
print(df.loc[filter])
col_name col_2
1 specific name blah
2 specific name blah blah
CodePudding user response:
With the index, you can use the below code. If you have excel values with row 4 header and from row 5 the rest data
col_name = excel_df.iloc[3:4,:].values.tolist()
df = excel_df.iloc[ 4:,: ]
df.columns = sum(col_name, [])
df