I have dataframe df1 ike this:
Database/Schema Name entity Name temp
0 schema1 ITEM_DESC1 Item_Nbr INT(1,2) NOT NULL
1 schema1 ITEM_DESC1 Old_Nbr INT(3,2) NOT NULL
2 schema1 ITEM_DESC1 Order_Dept_Nbr SMALLINT(6,2) NULL
3 schema1 ITEM_DESC1 Acct_Dept_Nbr SMALLINT(9,2) NULL
4 schema2 ITEM_DESC2 Primary_Desc CHAR(20,2) NULL
5 schema2 ITEM_DESC2 Secondary_Desc CHAR(20,4) NULL
6 schema2 ITEM_DESC2 Color_Desc CHAR(6,5) NULL
7 schema2 ITEM_DESC2 Size_Desc CHAR(6,3) NULL
In this dataframe I have two different schemas and tables(ITEM_DESC1
and ITEM_DESC2
). I want to build create table
statement out of this.
So, from the above dataframe I need a new dataframe which will have 2 rows (since 2 different tables in df1 ) and the value would be
df2:
ddl_statement
0 create table schema1.ITEM_DESC1 (Item_Nbr INT(1,2) NOT NULL,Old_Nbr INT(3,2) NOT NULL,Order_Dept_Nbr SMALLINT(6,2) NULL,Acct_Dept_Nbr SMALLINT(9,2) NULL)
1 create table schema2.ITEM_DESC2 (Primary_Desc CHAR(20,2) NULL,Secondary_Desc CHAR(20,4) NULL,Color_Desc CHAR(6,5) NULL,Size_Desc CHAR(6,3) NULL)
How can I achieve this with out using loop?
CodePudding user response:
Use groupby
and f-strings:
df2 = df.groupby(['Database/Schema Name', 'entity Name'])['temp'] \
.apply(lambda x: f"create table {x.name[0]}.{x.name[1]} ({', '.join(x)})") \
.reset_index(drop=True).to_frame('ddl_statement')
Output:
>>> df2
ddl_statement
0 create table schema1.ITEM_DESC1 (Item_Nbr INT(1,2) NOT NULL, Old_Nbr INT(3,2) NOT NULL, Order_Dept_Nbr SMALLINT(6,2) NULL, Acct_Dept_Nbr SMALLINT(9,2) NULL)
1 create table schema2.ITEM_DESC2 (Primary_Desc CHAR(20,2) NULL, Secondary_Desc CHAR(20,4) NULL, Color_Desc CHAR(6,5) NULL, Size_Desc CHAR(6,3) NULL)