How to turn string into dictionary with conditionals?-CodePudding

I have a dataframe (very large, millions of rows). Here how it looks:

id     value
a1     0:0,1:10,2:0,3:0,4:7
b4     0:5,1:0,2:0,3:0,4:1
c5     0:0,1:3,2:2,3:0,4:0
k2     0:0,1:2,2:0,3:4,4:0

I want to turn those strings into dictionary, but only those key value pairs, where there is no 0. So desired result is:

id          value
a1       {1:10, 4:7}
b4       {4:1}
c5       {1:3, 2:2}
k2       {1:2}

How to do that? when I try to use dict() function but it brings KeyError: 0:

df["value"] = dict(df["value"])

So I have problems with turning it into dictionary in the first place

I also have tried this:

df["value"] = json.loads(df["value"])

but it brings same error

CodePudding user response：

This could do the trick, simply using list comprehensions:

import pandas as pd

dt = pd.DataFrame({"id":["a1", "b4", "c5", "k2"], 
                   "value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})


def to_dict1(s):
    return [dict([map(int, y.split(":")) for y in x.split(",") if "0" not in y.split(":")]) for x in s]

            
dt["dict"] = to_dict1(dt["value"])

Another way to obtain the same result would be using regular expressions (the pattern (?!0{1})(\d) matches any number but a single 0):

import re

def to_dict2(s):
    return [dict([map(int, y) for y in re.findall("(?!0{1})(\d):(?!0{1})(\d )", x)]) for x in s]

In terms of performance, to_dict1 is almost 20% faster, according to my tests.

CodePudding user response：

This code will make a result you want. I made a sample input as you provided, and printed an expected result at the end.

import pandas as pd

df = pd.DataFrame(
    {
        'id': ['a1', 'b4', 'c5', 'k2'],
        'value': ['0:0,1:10,2:0,3:0,4:7', '0:5,1:0,2:0,3:0,4:1', '0:0,1:3,2:2,3:0,4:0', '0:0,1:2,2:0,3:4,4:0']
    }
)

value = []  # temporal value to save only key, value pairs without 0
for i, row in df.iterrows():
    pairs = row['value'].split(',')
    d = dict()
    for pair in pairs:
        k, v = pair.split(':')
        k = int(k)
        v = int(v)
        if (k != 0) and (v != 0):
            d[k] = v
    value.append(d)

df['value'] = pd.Series(value)

print(df)

#   id          value
#0  a1  {1: 10, 4: 7}
#1  b4         {4: 1}
#2  c5   {1: 3, 2: 2}
#3  k2   {1: 2, 3: 4}

CodePudding user response：

def make_dict(row):
   """ Requires string list of shape 
       ["0":"0", "1":"10", ...]"""
   return {key: val for key, val 
           in map(lambda x: map(int, x.split(":")), row) 
           if key != 0 and val != 0}

df["value"] = df.value.str.split(",").apply(make_dict)

CodePudding user response：

This is how I would do it:

def string_to_dict(s):
    d = {}
    pairs = s.split(',') # get each key pair
    for pair in pairs:
        key, value = pair.split(':') # split key from value
        if int(value): # skip the pairs with zero value
            d[key] = value
    return d

df['value'] = df['value'].apply(string_to_dict)

CodePudding user response：

use a dictionary comprehension to exclude key or value items equal to zero

txt="""id     value
a1     0:0,1:10,2:0,3:0,4:7
b4     0:5,1:0,2:0,3:0,4:1
c5     0:0,1:3,2:2,3:0,4:0
k2     0:0,1:2,2:0,3:4,4:0 """

df = pd.DataFrame({"id":["a1", "b4", "c5", "k2"], 
               "value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})

for key,row in df.iterrows():
    results=[]
    {results.append({int(k),int(v)}) if int(k)!=0 and int(v)!=0 else None for k,v in (x.split(':') for x in row['value'].split(','))}
    df.loc[key,'value']=results

 print(df)

output:

   id              value
0  a1  [{1, 10}, {4, 7}]
1  b4           [{1, 4}]
2  c5      [{1, 3}, {2}]
3  k2   [{1, 2}, {3, 4}]