Home > front end >  Pandas Pivot is not producing desired output
Pandas Pivot is not producing desired output

Time:05-15

I my data looks like below. I am trying to pivot the dataframe such that SCHEMA NAME AND TABLE NAME are in columns and Row Count, Table Type, date created and Date altered are in 1st column. I am referencing this post to achieve my desired output. Unfortunately, cannot find the solution that I need.Below is my code and desired output. The code gets SCHEMA_NAME and "TABLE_NAME" in columns which is correct, only the index and value part of the code is not producing desired output

Thanks in advance for your time and efforts!

Dataframe:

{'TABLE_SCHEMA': 0    TPCDS_SF100TCL
 1    TPCDS_SF100TCL
 2    TPCDS_SF100TCL
 3    TPCDS_SF100TCL
 4    TPCDS_SF100TCL
 5     TPCDS_SF10TCL
 6     TPCDS_SF10TCL
 7     TPCDS_SF10TCL
 8     TPCDS_SF10TCL
 9     TPCDS_SF10TCL
 Name: TABLE_SCHEMA, dtype: object,
 'TABLE_TYPE': 0    BASE TABLE
 1    BASE TABLE
 2    BASE TABLE
 3    BASE TABLE
 4    BASE TABLE
 5    BASE TABLE
 6    BASE TABLE
 7    BASE TABLE
 8    BASE TABLE
 9    BASE TABLE
 Name: TABLE_TYPE, dtype: object,
 'TABLE_NAME': 0              CALL_CENTER
 1             CATALOG_PAGE
 2                 CUSTOMER
 3         CUSTOMER_ADDRESS
 4    CUSTOMER_DEMOGRAPHICS
 5              CALL_CENTER
 6             CATALOG_PAGE
 7                 CUSTOMER
 8    CUSTOMER_DEMOGRAPHICS
 9         CUSTOMER_ADDRESS
 Name: TABLE_NAME, dtype: object,
 'ROW_COUNT': 0           60
 1        50000
 2    100000000
 3     50000000
 4      1920800
 5           54
 6        40000
 7     65000000
 8      1920800
 9     32500000
 Name: ROW_COUNT, dtype: object,
 'TABLE_CREATED_DATE': 0    2022-03-02
 1    2022-03-02
 2    2022-03-02
 3    2022-03-02
 4    2022-03-02
 5    2022-03-02
 6    2022-03-02
 7    2022-03-02
 8    2022-03-02
 9    2022-03-02
 Name: TABLE_CREATED_DATE, dtype: object,
 'LAST_ALTERED_DATE': 0    2022-05-06
 1    2022-03-02
 2    2022-03-02
 3    2022-03-02
 4    2022-03-02
 5    2022-03-02
 6    2022-03-02
 7    2022-03-02
 8    2022-03-02
 9    2022-03-02
 Name: LAST_ALTERED_DATE, dtype: object}

Python Code:

pd.pivot(df, columns = ["TABLE_SCHEMA","TABLE_NAME"],index=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'],
        values=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'])

Desired outuput (Below output is for 1 schema I need both schemas in single table)

TABLE_SCHEMA                                 TPCDS_SF100TCL 
TABLE_NAME  CALL_CENTER CATALOG_PAGE CUSTOMER   CUSTOMER_ADDRESS CUSTOMER_DEMOGRAPHICS
ROW_COUNT   
TABLE_TYPE  
TABLE_CREATED_DATE  
LAST_ALTERED_DATE                                                                                                                        

CodePudding user response:

try this:

data_df.set_index(['TABLE_SCHEMA', 'TABLE_NAME'], drop=True).T

data_df is the original dataframe that you provided

  • Related