Home > OS >  Converting rows to wide columns based on duplicated ids in another column in pandas
Converting rows to wide columns based on duplicated ids in another column in pandas

Time:12-23

My question is similar to this, this, and this questions.

But still cannot resolve it.

I have a dataframe with duplicated ids

ID  Publication_type
1   Journal          
1   Clinical study   
1   Guideline        
2   Journal          
2   Letter           

I want to make it wide, but I do not know how many publication type will I have - maybe 2, maybe 20. Thus, I do not know how many columns wide will I need. The max size of wide columns for publication_type must be not be more than the number of types for each id.

Expected output

 ID Publication_type1 Publication_type2 Publication_type 3    etc
 1  Journal           Clinical Study    Guideline
 2  Journal           Letter            NaN

For now I do not need to put the same publication type into the same column. I do not need all articles in the same column. Thanks!

CodePudding user response:

You can group by ID, aggregate via list, and then create a new DataFrame from the results:

col = 'Publication_type'
new_df = pd.DataFrame(df.groupby('ID')[col].agg(lambda x: x.tolist()).tolist()).replace({None: np.nan})
new_df.columns = [f'{col}{i}' for i in new_df.columns   1]
new_df['ID'] = df['ID'].drop_duplicates().reset_index(drop=True)

Output:

>>> df
  Publication_type1 Publication_type2 Publication_type3  ID
0           Journal    Clinical-study         Guideline   1
1           Journal            Letter               NaN   2
  • Related