Sorry for the lousy text in the question? I can't come up with a summarized way to ask this question.
I have a dataframe (variable df) such as the below:
df
ID | A | B | C |
---|---|---|---|
1 | m | nan | nan |
2 | n | nan | nan |
3 | b | nan | nan |
1 | nan | t | nan |
2 | nan | e | nan |
3 | nan | r | nan |
1 | nan | nan | y |
2 | nan | nan | u |
3 | nan | nan | i |
The desired output is:
ID | A | B | C |
---|---|---|---|
1 | m | t | y |
2 | n | e | u |
3 | b | r | i |
I solved this by running the following lines:
new_df = pd.DataFrame()
for column in df.columns:
new_df = pd.concat([new_df, df[column].dropna()], join='outer', axis=1)
And then I figured this would be faster:
empty_dict = {}
for column in df.columns:
empty_dict[column] = df[column].dropna()
new_df = pd.DataFrame.from_dict(empty_dict)
However, the dropna could represent a problem if, for example, there is a missing value in the rows that have the values to be used in each column. E.g. if df.loc[2,'A'] = nan, then that key in the dictionary will only have 2 values causing a misalignment with the rest of the columns. I'm not convinced.
I have the feeling pandas must have a builtin function that will do a better job and either of my two solutions. Is there? If not, is there any better way of solving this?
CodePudding user response:
Looks like you only need groupby().first()
:
df.groupby('ID', as_index=False).first()
Output:
ID A B C
0 1 m t y
1 2 n e u
2 3 b r i
CodePudding user response:
Use stack_unstack()
as suggested by @QuangHoang if ID
is the index:
>>> df.stack().unstack().reset_index()
A B C
ID
1 m t y
2 n e u
3 b r i
You can use melt
and pivot
:
>>> df.melt('ID').dropna().pivot('ID', 'variable', 'value') \
.rename_axis(columns=None).reset_index()
ID A B C
0 1 m t y
1 2 n e u
2 3 b r i