Home > Software engineering >  Python multiple pivot from the same column
Python multiple pivot from the same column

Time:10-12

I have a dataframe with just one column with content like:

view: meta_record_extract
dimension: e_filter
type: string
hidden: yes
sql: "SELECT * FROM files"
dimension: category
type: string
...

What I tried to produce would be a dataframe with columns and the data like this:

____________________________________________________________________    
view                    | dimension |label | type  | hidden | sql      |
     meta_record_extract| e_filter  | NaN  | string| yes    |"SELECT * FROM files" 
NaN                     | category  | NaN  | string ...

Given that splitting the string data like

df.header[0].split(': ')[0]

was giving me label with [0] or value with [1] I tried this:

df.pivot_table(df, columns = df.header.str.split(': ')[0], values = df.header.str.split(': ')[1])

but it did not work giving the error.

Can anyone help me to achieve the result I need?

CodePudding user response:

Use str.findall() map, as follows:

str.findall() helps you extract the keyword and value pairs into a list. We then map the list of keyword-value pairs into a dict for pd.Dataframe to turn the dict into a dataframe.

(Assuming the column label of your column is Col1):

df_extract = df['Col1'].str.findall(r'(\w ):\s*(.*)')

df_result = pd.DataFrame(map(dict, df_extract))

Result:

print(df_result)



                  view dimension    type hidden                    sql
0  meta_record_extract       NaN     NaN    NaN                    NaN
1                  NaN  e_filter     NaN    NaN                    NaN
2                  NaN       NaN  string    NaN                    NaN
3                  NaN       NaN     NaN    yes                    NaN
4                  NaN       NaN     NaN    NaN  "SELECT * FROM files"
5                  NaN  category     NaN    NaN                    NaN
6                  NaN       NaN  string    NaN                    NaN

Update

To compress the rows to minimize the NaN's, we can further use .apply() with .dropna(), as follows:

df_compressed = df_result.apply(lambda x:  pd.Series(x.dropna().values))

Result:

print(df_compressed)


                  view dimension    type hidden                    sql
0  meta_record_extract  e_filter  string    yes  "SELECT * FROM files"
1                  NaN  category  string    NaN                    NaN
  • Related