Home > Software engineering >  Python: List to Pandas DataFrame without duplicate values in column only
Python: List to Pandas DataFrame without duplicate values in column only

Time:10-23

I have a List in python 3.x as follows:

print_hr()
ATMITMData('Bank Nifty')
print_hr()
print(bn_strikes)
print(tabulate(pd.DataFrame(bn_strikes), headers='keys', tablefmt='psql', showindex=False))

I get the list and the df like this:

List:
{'ATM': 40800, 'ITM': [40700, 40600, 40500], 'OTM': [40900, 41000, 41100]}

DF:
 ------- ------- ------- 
|   ATM |   ITM |   OTM |
|------- ------- -------|
| 40800 | 40700 | 40900 |
| 40800 | 40600 | 41000 |
| 40800 | 40500 | 41100 |
 ------- ------- ------- 

My expected output is:

 ------- ------- ------- 
|   ATM |   ITM |   OTM |
|------- ------- -------|
| 40800 | 40700 | 40900 |
|       | 40600 | 41000 |
|       | 40500 | 41100 |
 ------- ------- ------- 

This is what I tried:

print(tabulate(pd.DataFrame(bn_strikes).drop_duplicates(subset=["ATM"]), 
               headers='keys', tablefmt='psql', showindex=False))

And the output does not match what I want:

 ------- ------- ------- 
|   ATM |   ITM |   OTM |
|------- ------- -------|
| 40800 | 40700 | 40900 |
 ------- ------- ------- 

CodePudding user response:

i have two solutions. First:

import pandas as pd
df=pd.DataFrame(bn_strikes)
df.loc[df.duplicated(['ATM']), 'ATM'] = "" # you can fill with np.nan instead of ("")
print(df)
'''
    ATM     ITM     OTM
0   40800   40700   40900
1           40600   41000
2           40500   41100

'''

second(taller):

df=pd.json_normalize(bn_strikes)
df1=df[['ITM']].explode('ITM')
df2=df[['OTM']].explode('OTM')
final=pd.concat([df[['ATM']],df1,df2],axis=0).fillna(0) #fillna it with a number not in the dataset
for col in final:
    final[col] = final[col].sort_values(ignore_index=False,ascending=False)
import numpy as np
final=final.replace(0,np.nan).dropna(how='all').fillna("")
print(final)
'''
    ATM     ITM     OTM
0   40800   40700   41100
0           40600   41000
0           40500   40900

'''
  • Related