Home > front end >  How to reframe the dataframe based on column and row values?
How to reframe the dataframe based on column and row values?

Time:09-28

I have a dataframe as follows:

data = {
    'Title': ['001C', '001C', '004C', '001C', '004C', '004C', '007C', '010C'],
    'Items': ['A', 'B', 'D', 'A', 'A', 'K', 'L', 'M']
}
df = pd.DataFrame(data)

df

    Title   Items
0   001C    A
1   001C    B
2   004C    D
3   001C    A
4   004C    A
5   004C    K
6   007C    L
7   010C    M

I want to get the Items under each Title without any redundancy. The expected output is

    001C    004C    007C    010C
0   A       D       L       M
1   B       A                
2           K                       

CodePudding user response:

You can drop_duplicates, assign a helper column with increasing indexes per Item per group, and pivot:

(df.drop_duplicates(subset=['Title', 'Items'])
   .assign(index=df.groupby('Title').cumcount())
   .pivot(index='index', columns='Title', values='Items')
   .rename_axis(index=None, columns=None)
  #.fillna('') # uncomment if you want empty strings in place of NaNs 
)

output:

      001C 004C 007C 010C                 
0        A    D    L    M
1        B    A  NaN  NaN
2      NaN    K  NaN  NaN

CodePudding user response:

You can also use .drop_duplicates() .pivot(). Then, relocate the non-NaN values of each column to the top by .dropna(), as follows:

(df.drop_duplicates()
   .pivot(columns='Title', values='Items')
   .apply(lambda x: pd.Series(x.dropna().values))
   .rename_axis(columns=None)
)

Result:

  001C 004C 007C 010C
0    A    D    L    M
1    B    A  NaN  NaN
2  NaN    K  NaN  NaN
  • Related