Pandas: Same indices for each column. Is there a better way to solve this?-CodePudding

Sorry for the lousy text in the question? I can't come up with a summarized way to ask this question.

I have a dataframe (variable df) such as the below:

df

ID	A	B	C
1	m	nan	nan
2	n	nan	nan
3	b	nan	nan
1	nan	t	nan
2	nan	e	nan
3	nan	r	nan
1	nan	nan	y
2	nan	nan	u
3	nan	nan	i

The desired output is:

ID	A	B	C
1	m	t	y
2	n	e	u
3	b	r	i

I solved this by running the following lines:

new_df = pd.DataFrame()
for column in df.columns:
    new_df = pd.concat([new_df, df[column].dropna()], join='outer', axis=1)

And then I figured this would be faster:

empty_dict = {}
for column in df.columns:
    empty_dict[column] = df[column].dropna()
new_df = pd.DataFrame.from_dict(empty_dict)

However, the dropna could represent a problem if, for example, there is a missing value in the rows that have the values to be used in each column. E.g. if df.loc[2,'A'] = nan, then that key in the dictionary will only have 2 values causing a misalignment with the rest of the columns. I'm not convinced.

I have the feeling pandas must have a builtin function that will do a better job and either of my two solutions. Is there? If not, is there any better way of solving this?

CodePudding user response：

Looks like you only need groupby().first():

df.groupby('ID', as_index=False).first()

Output:

   ID  A  B  C
0   1  m  t  y
1   2  n  e  u
2   3  b  r  i

CodePudding user response：

Use stack_unstack() as suggested by @QuangHoang if ID is the index:

>>> df.stack().unstack().reset_index()
    A  B  C
ID         
1   m  t  y
2   n  e  u
3   b  r  i

You can use melt and pivot:

>>> df.melt('ID').dropna().pivot('ID', 'variable', 'value') \
      .rename_axis(columns=None).reset_index()

   ID  A  B  C
0   1  m  t  y
1   2  n  e  u
2   3  b  r  i

ID	A	B	C
1	m	nan	nan
2	n	nan	nan
3	b	nan	nan
1	nan	t	nan
2	nan	e	nan
3	nan	r	nan
1	nan	nan	y
2	nan	nan	u
3	nan	nan	i

ID	A	B	C
1	m	nan	nan
2	n	nan	nan
3	b	nan	nan
1	nan	t	nan
2	nan	e	nan
3	nan	r	nan
1	nan	nan	y
2	nan	nan	u
3	nan	nan	i

ID	A	B	C
1	m	nan	nan
2	n	nan	nan
3	b	nan	nan
1	nan	t	nan
2	nan	e	nan
3	nan	r	nan
1	nan	nan	y
2	nan	nan	u
3	nan	nan	i