I ve got not standart excel table with the help of openpyxl. I've done some part on the way to convert it to pandas dataframe. But now I'm stucked with problem. I want to select just range of columns rows and get data from them. Like take cells from 4 to 12 row, and column from j to x. I hope you understand me. Sorry for my english.
CodePudding user response:
You can try something like that:
df = pd.read_excel('data.xlsx', skiprows=4, usecols=['J:X'], nrows=9)
If the number of rows is not fixed, you can use your second column as delimiter.
df = pd.read_excel('data.xlsx', skiprows=4, usecols=['J:X'])
df = df[df.iloc[:, 1].notna()]
CodePudding user response:
you could skip the rows as you read the excel file to a Dataframe and initially drop the first 4 rows and then manipulate the Dataframe as follows.
- first line is reading the file by skipping the first 4 rows
- second line is dropping a range of rows from the dataframe (startRow and endRow being the integer values of the row index)
- third line is dropping 2 columns from the dataframe
df = pd.read_excel('fileName.xlsx', skiprows=4)
df.drop([startRow, endRow], inplace=True)
df.drop(['column1', 'column2'], axis=1)