Home > Back-end >  Dataframe conversion using python
Dataframe conversion using python

Time:09-21

we have the following dataframe .

enter image description here

import pandas as pd
import numpy as np
a1 =["school.bgs.id","school.bgs.title","school.bgs.city","school.bgs.bgs1.id","school.sggs.id","school.sggs.title","school.sggs.city","school.sggs.srt.title","school.sggs.state"]
a2=[55,"BGS","pune",34,np.nan,np.nan,np.nan,np.nan,np.nan]
a3=[np.nan,np.nan,np.nan,np.nan,230,"SGGS","Nanded","SRT","maharashtra"]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data',0,1])

and expected output:

enter image description here

Kindly suggest better solution for the same

CodePudding user response:

This will do what you've asked:

df['new_row'] = df.data.str.split('.').str[:-1].str.join('.')
df['new_col'] = df.data.str.split('.').str[-1]
df['new_val'] = df[0].where(df[0].notna(), df[1])
df = df.pivot('new_row','new_col','new_val')[['id','title','city','state']].rename_axis(None, axis='columns').rename_axis(None, axis='index')

Output:

                  id title    city        state
school.bgs        55   BGS    pune          NaN
school.bgs.bgs1   34   NaN     NaN          NaN
school.sggs      230  SGGS  Nanded  maharashtra
school.sggs.srt  NaN   SRT     NaN          NaN

Explanation:

  • take the final dot-separated token within the data column as new_col and the corresponding prefix from data as new_row
  • take the value of column 0 if non-null, else that of column 1, as new_val
  • use pivot() to create the desired output, with columns reordered using [['id','title','city','state']] and axis names removed using rename_axis().

CodePudding user response:

using pivot looks quite suitable, so this solution is very similar with this:

df[['data','col']] = df['data'].str.rsplit('.',1,expand=True)
df = df.assign(val=df[0].combine_first(df[1])).pivot('data','col','val')

and the result is:

col                city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT

CodePudding user response:

Another rather similar way is as the following:

import pandas as pd

df[['data', 'cols']] = df['data'].str.extract(r'(?P<data>[\w.] )\.(?P<cols>\w $)')

df = (pd.pivot(df, index = 'data', columns='cols', values=[0, 1])
      .stack(level=0)
      .droplevel(1))

cols               city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT
  • Related