Given the following df:
order by this | column1 | column2 | column n |
---|---|---|---|
abc | abc | def | ghi |
def | def | ghi | jkl |
ghi | ghi | jkl | nan |
jkl | mno | nan | nan |
mno | nan | nan | nan |
--> One column with a complete set of unique values
--> Several columns with a subset of values. These columns have a different length and are thus 'stacked' to the top.
What I want is to use the values in the first column to reorder the values of the other columns, ie match them against first column.
Expected outcome:
order by this | column1 | column2 | column n |
---|---|---|---|
abc | abc | nan | nan |
def | def | def | nan |
ghi | ghi | ghi | ghi |
jkl | nan | jkl | jkl |
mno | mno | nan | nan |
I guess I can iterrows through 'order by this' and then use loc and something to fill with nan's? Is there a better way?
CodePudding user response:
As what you want is equivalent to checking whether each value from the "order" column is present in the other columns, you could simply do:
df.apply(lambda s: df['order'].where(df['order'].isin(s)))
output:
order column1 column2 column_n
0 abc abc NaN NaN
1 def def def NaN
2 ghi ghi ghi ghi
3 jkl NaN jkl jkl
4 mno mno NaN NaN
CodePudding user response:
Maybe I'm not getting it, but shouldn't a simple .sort_values () do that? The fact that some NaN are present seems not relevant...
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html