I have a data frame df
like:
age=14 gender=male loc=NY key=0012328434 Unnamed: 4
age=45 gender=female loc=CS key=834734hh43 pre="axe"
age=23 gender=female loc=CA key=545df35fdf NaN
..
..
age=65 gender=male loc=LA key=dfdf545dfg pre="cold"
And I need this df to have a header and remove the redundant data, like desired_df
:
age gender loc key pre
14 male NY 0012328434 NaN
45 female CS 834734hh43 axe
23 female CA 545df35fdf NaN
..
..
65 male LA dfdf545dfg cold
what I tried to do:
df1 = df.str.split()
df_out = pd.DataFrame(df1.str[1::2].tolist(), columns=df1[0][0::2])
but this fails, clearly as I do not have a df name to begin with. Any help would be really appreciated.
CodePudding user response:
(Untested!)
headers = ['age', 'gender', 'loc', 'key', 'pre']
df.columns = headers
for name in df.columns:
df[name] = df[name].str.removeprefix(f'{name}=')
CodePudding user response:
df[df == 'NaN'] = 'NaN:NaN'
pd.DataFrame(df.fillna('NaN:NaN')
.apply(lambda x: dict(list(x.str.replace('"', '')
.str.split('[:=]'))), axis=1).to_list())
.drop('NaN', axis = 1)
age gender loc key Unnamed pre
0 14 male NY 0012328434 4 NaN
1 45 female CS 834734hh43 NaN axe
2 23 female CA 545df35fdf NaN NaN
3 65 male LA dfdf545dfg NaN cold
Data
d = {0: {0: 'age=14', 1: 'age=45', 2: 'age=23', 3: 'age=65'},
1: {0: 'gender=male',
1: 'gender=female',
2: 'gender=female',
3: 'gender=male'},
2: {0: 'loc=NY', 1: 'loc=CS', 2: 'loc=CA', 3: 'loc=LA'},
3: {0: 'key=0012328434',
1: 'key=834734hh43',
2: 'key=545df35fdf',
3: 'key=dfdf545dfg'},
4: {0: 'Unnamed:4', 1: 'pre="axe"', 2: nan, 3: 'pre="cold"'}}
df = pd.DataFrame(d)