panda df not showing all rows after loading from MS SQL-CodePudding

I'm using Pandas with latest sqlalchemy (1.4.36) to query a MS SQL DB, using the following Python 3.10.3 [Win] snippet:

import pandas as pd                                             # 
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
# ...
def get_table_columns():
    SQLA = 'SELECT TABLE_NAME,COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE \'pa%\' ORDER BY TABLE_NAME;' 

    # Use pandas for getting named table & columns
    conn_str = set_db_info()
    conn_url = URL.create("mssql pyodbc", query={"odbc_connect": conn_str})
    engine = create_engine(conn_url)

    df = pd.read_sql(SQLA, engine)

    # Permanently changes the pandas settings
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    print(df)
    return df

However, this only prints the first 292 rows, and not all of the entire 2351 rows. Using REPL, I can check this with:

>>> z = get_table_columns()
>>> z
                TABLE_NAME        COLUMN_NAME
0                    paacc           accesscd
... # <-- I added these
292                paapepi        piapeheadat

>>> z.count()
TABLE_NAME     2351
COLUMN_NAME    2351
dtype: int64

>>> z.shape[0]
2351

>>> z.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2351 entries, 0 to 2350
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   TABLE_NAME   2351 non-null   object
 1   COLUMN_NAME  2351 non-null   object
dtypes: object(2)
memory usage: 36.9  KB

Q: What is going on, and why can't I print/show all the rows?

CodePudding user response：

To display all the rows in pandas, you should set the display option to None or 1 extra from the dataframe size as you have done in your code:

pd.set_option('display.max_rows', None)
pandas.set_option('display.max_rows', z.shape[0] 1)

Given that this is not the problem, it may be that the IDE or program that you use automatically crops this information from the view (For example Ipython crops every big output).

Other thing to try is to force the print of the dataframe instead of just is return value:

>>> print(z)

To inspect everything I would recommend you to pass that into a csv/excel file to do it better.

CodePudding user response：

The problem was that pandas returns a packed dataframe (DF). So when printing you only get a partial (at best) or broken (at worst) strings from the data.

For some reason this is always on by default and the results varies widely as to how much of the string/data that is shown. The solution is to use the unpacking operator (*) before/when trying to print the df, like this:

print(*df)

(This is also know as the splat operator for Ruby enthusiasts.)

References & tutorials: