Home > Software engineering >  Pandas: Same indices for each column. Is there a better way to solve this?
Pandas: Same indices for each column. Is there a better way to solve this?

Time:11-12

Sorry for the lousy text in the question? I can't come up with a summarized way to ask this question.

I have a dataframe (variable df) such as the below:

df
ID A B C
1 m nan nan
2 n nan nan
3 b nan nan
1 nan t nan
2 nan e nan
3 nan r nan
1 nan nan y
2 nan nan u
3 nan nan i

The desired output is:

ID A B C
1 m t y
2 n e u
3 b r i

I solved this by running the following lines:

new_df = pd.DataFrame()
for column in df.columns:
    new_df = pd.concat([new_df, df[column].dropna()], join='outer', axis=1)

And then I figured this would be faster:

empty_dict = {}
for column in df.columns:
    empty_dict[column] = df[column].dropna()
new_df = pd.DataFrame.from_dict(empty_dict)

However, the dropna could represent a problem if, for example, there is a missing value in the rows that have the values to be used in each column. E.g. if df.loc[2,'A'] = nan, then that key in the dictionary will only have 2 values causing a misalignment with the rest of the columns. I'm not convinced.

I have the feeling pandas must have a builtin function that will do a better job and either of my two solutions. Is there? If not, is there any better way of solving this?

CodePudding user response:

Looks like you only need groupby().first():

df.groupby('ID', as_index=False).first()

Output:

   ID  A  B  C
0   1  m  t  y
1   2  n  e  u
2   3  b  r  i

CodePudding user response:

Use stack_unstack() as suggested by @QuangHoang if ID is the index:

>>> df.stack().unstack().reset_index()
    A  B  C
ID         
1   m  t  y
2   n  e  u
3   b  r  i

You can use melt and pivot:

>>> df.melt('ID').dropna().pivot('ID', 'variable', 'value') \
      .rename_axis(columns=None).reset_index()

   ID  A  B  C
0   1  m  t  y
1   2  n  e  u
2   3  b  r  i
  • Related