Home > database >  Insert to pandas dataframe value to specific column
Insert to pandas dataframe value to specific column

Time:03-28

I use python and pandas to analyze big data set. I have a several arrays with different length. I need to insert values to specific column. If some values ​​are not present for column it should be 'not defined'. Input data looks like row in dataframe with different positions. Expected output: Expected output
Examples of input data:

# Example 1
{'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}

# Example 2
{'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

# Example 3
{'Water Solubility': '1E 006 mg/L (at 25 °C)', 'Melting Point': '204-205 °C', 'logP': '1.1', 'pKa': '6.78'}

I have tried to add to array 'Not defined' but I couldn't find the right approach

CodePudding user response:

I think the best way is to just create a dataframe for each dict, and then concat the dataframes.

d_1 = {'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}
    
d_2 =  {'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

df_1 = pd.DataFrame([d_1], columns=d_1.keys()) 

df_2 = pd.DataFrame([d_2], columns=d_2.keys())

final_df = pd.concat([df_1, df_2]).fillna(0)

You can build a function that takes a list of dicts and returns a final dataframe

CodePudding user response:

This should do what you're asking:

import pandas as pd
import numpy as np

# Example 1
ex1 = {'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}

# Example 2
ex2 = {'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

# Example 3
ex3 = {'Water Solubility': '1E 006 mg/L (at 25 °C)', 'Melting Point': '204-205 °C', 'logP': '1.1', 'pKa': '6.78'}


df = pd.DataFrame({
    'Boiling Point':[162-165, 'Not defined'],
    'Hydrophobicity':[-0.5227, -0.427],
    'Isoelectric Point':[9.02, 12.02],
    'Melting Point':[1000.0, 'Not defined'],
    'Molecular Formula':['C1970H3848N50O947S4', 'Not defined'],
    'Molecular Weight':[9.23, 7.13],
    'Radioactivity':['Practically insoluble', 'Not defined'],
    'Water Solubility':[1.23, 2.87],
    'caco2 Permeability':['63.6±55.0', 901],
    'logP':[14, 14],
    'logS':[0.618, 0.238],
    'pKa':['Not defined', 'Not defined']
})

df = pd.concat([df, pd.DataFrame([ex1, ex2, ex3])], ignore_index=True)
df.iloc[-3:] = df.iloc[-3:].apply(lambda x: ['Not defined' if str(v)=='nan' else v for v in x])
print(df.to_string())

Output:

  Boiling Point Hydrophobicity Isoelectric Point      Melting Point      Molecular Formula Molecular Weight          Radioactivity        Water Solubility caco2 Permeability         logP         logS          pKa
0            -3        -0.5227              9.02             1000.0    C1970H3848N50O947S4             9.23  Practically insoluble                    1.23          63.6±55.0           14        0.618  Not defined
1   Not defined         -0.427             12.02        Not defined            Not defined             7.13            Not defined                    2.87                901           14        0.238  Not defined
2   Not defined    Not defined       Not defined         135-138 °C            Not defined      Not defined            Not defined              Insoluble         Not defined         4.68  Not defined  Not defined
3   Not defined         -0.529              7.89  71 °C (whole mAb)  C2224H3475N621O698S36          51234.9            Not defined             Not defined        Not defined  Not defined  Not defined  Not defined
4   Not defined    Not defined       Not defined         204-205 °C            Not defined      Not defined            Not defined  1E 006 mg/L (at 25 °C)        Not defined          1.1  Not defined         6.78
  • Related