how to split the value and assign a boolean value and convert into dict pandas-CodePudding

I am trying to split the value by delimiter | and for each value it has to be assigned True.

ID   Condition
1    Null
2    NP
3    NP|KH
4    KH|PR|MM

output
ID   Condition
1    null
2    {"NP"=True}
3    {"NP"=True,"KH"=True}
4    {"KH"=True,"PR"=True,"MM"=True}

I am trying with this code but i am mising something

for v in df.Condition:
            if not pd.isna(v):
                if not "|" in v:
                    v={v:True}
                else:
                    key= v.split("|")
                    d=[]
                    for i in range(0,len(key)):
                        d.append({key[i]:True})

But this is saving the result as [{"NP"=True},{"KH"=True}]

Can anyone please help me get the output in right format?

CodePudding user response：

If my assumptions in the comment are correct:

mask = df.Condition.notnull()
result = df.loc[mask, 'Condition']\
           .str.split("|")\
           .apply(lambda cond: {term: True for term in cond})
#1                            {'NP': True}
#2                {'NP': True, 'KH': True}
#3    {'KH': True, 'PR': True, 'MM': True}

You can put the results back into the original dataframe:

df.loc[mask, 'Condition'] = result

CodePudding user response：

Why have you defined d as a list? That's what the wrong with this your code. Define d as a dictionary as following and run your code. You will get the answer in right format.

import pandas as pd 

df = pd.read_csv("t.csv")
for v in df.Condition:
    if not pd.isna(v):
        if not "|" in v:
            v={v:True}
            print(v)
        else:
            key= v.split("|")
            d={}
            for i in range(0,len(key)):
                d[key[i]]=True
            print(d)

And the second place where you have done wrong is in the data frame. Pandas will take Null as "Null" (a string) . and it will give a wrong result. So keep the place a blank inside the file you reading or of you are creating a df manually, keep that place as numpy.NaN

CodePudding user response：

I think Something like this will work for you:

import numpy
import pandas


# Create some dummy data
df = pandas.DataFrame({'Condition':[numpy.nan, 'NP', 'NP|KH', 'KH|PR|MM',]})

df.assign(Condition=df.apply(lambda row: {
    item: True
    for items in row.str.split('|')
    if type(items) == list
    for item in items
}, axis=1))

Note that this results in a empty dict instead of a null for null items.

                              Condition
0                                    {}
1                          {'NP': True}
2              {'NP': True, 'KH': True}
3  {'KH': True, 'PR': True, 'MM': True}

if it is important to have NaNs instead of empty dicts you could follow this with

df.assign(Condition=df.apply(lambda row: row if row.iloc[0] else numpy.nan, axis=1))

CodePudding user response：

use split and a findall regular expression

txt="""1    Null
2    NP
3    NP|KH
4    KH|PR|MM"""

elements=txt.split("\n")
for element in elements:
    matches=re.findall(r'([0-9] \s )([A-Za-z|] )', element)
    output=""
    for match in matches:
        elements=match[1].split("|")
        #for element in elements:
        if len(elements)>1:
            output="=True,".join(elements)
        elif elements[0]!="Null":
            output=str(elements[0]) "=True"
        else:
            output=str(elements[0])
        print(output)

output:

Null
NP=True
NP=True,KH
KH=True,PR=True,MM