I need to arrange a Pandas DataFrame with values that aren't in the right columns. I would like to rearrange the values in the cells according to a prefix that I have, and push the 'unknown' columns with their values to the end of the dataframe.
I have the following dataframe:
The output I am looking for is: the 'known' values have a header while the unknowns (5, 6) are to the end. the 'rule': if there is no cell with '|' in the column then the column name will not be changed.
any suggestions that I could try would be really helpful in solving this.
CodePudding user response:
Try this:
import pandas as pd
rename_dict = {} # reset rename dictionay
df = pd.DataFrame({'1':['name | Steve', 'name | John'],
'2':[None, None],
'3':[None , 'age | 50']})
for col in df.columns:
vals = df[col].values # look at values in each column
vals = [x for x in vals if x] # remove Nulls
vals = [x for x in vals if '|' in x] # leave values with | only
if len(vals) > 0:
new_col_name = vals[0].split('|')[0] # getting the new column name
rename_dict[col] = new_col_name # add column names to rename dictionay
df.rename(columns=rename_dict, inplace = True) # renaming the column name
df
name 2 age
0 name | Steve None None
1 name | John None age | 50
CodePudding user response:
it looks a bit tricky and not exactly what you expected, but it might give you an idea how to solve your task:
df = pd.DataFrame([['email | [email protected]','name | name1','surname | surname1','','',''],
['email | [email protected]','','name | name2','occupation | student','surname | surname2','abc | 123']])
df.apply(lambda x: pd.Series(dict([tuple(i.split(' | ')) for i in x.tolist() if i])),axis=1)
>>> out
'''
abc email name occupation surname
0 NaN [email protected] name1 NaN surname1
1 123 [email protected] name2 student surname2