Home > OS >  split row to create header in pandas
split row to create header in pandas

Time:06-02

I have a data frame df like:

age=14  gender=male     loc=NY  key=0012328434    Unnamed: 4
age=45  gender=female   loc=CS  key=834734hh43    pre="axe"
age=23  gender=female   loc=CA  key=545df35fdf    NaN
..
..
age=65  gender=male     loc=LA  key=dfdf545dfg    pre="cold"

And I need this df to have a header and remove the redundant data, like desired_df:

age     gender          loc     key             pre
14      male            NY      0012328434      NaN
45      female          CS      834734hh43      axe
23      female          CA      545df35fdf      NaN
..
..
65      male            LA      dfdf545dfg      cold

what I tried to do:

df1 = df.str.split()
df_out = pd.DataFrame(df1.str[1::2].tolist(), columns=df1[0][0::2])

but this fails, clearly as I do not have a df name to begin with. Any help would be really appreciated.

CodePudding user response:

(Untested!)

headers = ['age', 'gender', 'loc', 'key', 'pre']

df.columns = headers
for name in df.columns:
    df[name] = df[name].str.removeprefix(f'{name}=')

CodePudding user response:

df[df == 'NaN'] = 'NaN:NaN'

pd.DataFrame(df.fillna('NaN:NaN')
  .apply(lambda x: dict(list(x.str.replace('"', '')
                              .str.split('[:=]'))), axis=1).to_list())
  .drop('NaN', axis = 1)

  age  gender loc         key Unnamed   pre
0  14    male  NY  0012328434       4   NaN
1  45  female  CS  834734hh43     NaN   axe
2  23  female  CA  545df35fdf     NaN   NaN
3  65    male  LA  dfdf545dfg     NaN  cold

Data

d = {0: {0: 'age=14', 1: 'age=45', 2: 'age=23', 3: 'age=65'},
 1: {0: 'gender=male',
  1: 'gender=female',
  2: 'gender=female',
  3: 'gender=male'},
 2: {0: 'loc=NY', 1: 'loc=CS', 2: 'loc=CA', 3: 'loc=LA'},
 3: {0: 'key=0012328434',
  1: 'key=834734hh43',
  2: 'key=545df35fdf',
  3: 'key=dfdf545dfg'},
 4: {0: 'Unnamed:4', 1: 'pre="axe"', 2: nan, 3: 'pre="cold"'}}

df = pd.DataFrame(d)
  • Related