This is a sample script for some basic testing that later should be implemented in PowerQuery/Python. I'm not a Python developer so please be gentle.
import pandas as pd
data = [['Sql "syn-eu2-prd-edw-001 database windows net", "syndpeu2prdedw1", Query="SELECT ta FROM rdv_60_137.Account"', "x1"], ["b1", "x2"]]
df = pd.DataFrame(data, columns=["Expression","Key"])
ser = df['Expression']
serpd = pd.Series(ser)
print (serpd)
x = serpd.str.extractall(r'"(.*?)"')
print (x)
z = pd.merge(df, x, how="inner", left_index=True, right_index=True)
print (z)
The goal would be to have a new DataFrame where I have all records from the first set, but the 1st record would have 3 instances as the Regex function created 3 rows.
Kr, Harry
CodePudding user response:
Remove second level of MultiIndex by MultiIndex.droplevel
and add original rows by DataFrame.join
:
z = (df.join(df['Expression'].str.extractall(r'"(.*?)"')[0].droplevel(1).rename('New'))
.reset_index(drop=True))
print (z)
Expression Key \
0 Sql "syn-eu2-prd-edw-001 database windows net"... x1
1 Sql "syn-eu2-prd-edw-001 database windows net"... x1
2 Sql "syn-eu2-prd-edw-001 database windows net"... x1
3 b1 x2
New
0 syn-eu2-prd-edw-001 database windows net
1 syndpeu2prdedw1
2 SELECT ta FROM rdv_60_137.Account
3 NaN