my data look like this:
{569328: '[ 596005 4321416 5802640 6031690 6043910 8600475 8642629 9203255 9345445 10177065 10455451 13428248 22139349 22591458 24627241 24750476 26261826 26405611 27079105 27096884]',
574660: '[ 5956195 11260528 22181831 22437920 22642946 23278096 23407037 23458128 24244657 24355363 25014714 25115774 25156886 27047688 27089078 27398716]',
1187498: '[ 5855196 7755392 11183886 22894980 24648618 27185399]',
1226468: '[ 3573464 6279285 6294985 6542463 6981930 7427770 10325811 14970234 16878329 17935009 21811002 22329817 23543436 23907898 24456108 25283772]',
1236571: '[ 2777078 2826073 5944733 10484188 11052747 14682645 15688752 22333410 22614097 22646501 22783765 22978728 23231683 24259740 24605606 24839432 25492752 27009992 27044704]'}
As you can see, the values of the dict which are a column of my pandas df, are strings. However, I would like to convert them into proper lists. My result should look like this:
{569328: [596005, 4321416,5802640,6031690,6043910,8600475,8642629,9203255,9345445, 10177065,10455451,13428248,22139349,22591458,24627241,24750476,26261826,26405611, 27079105,27096884],
574660: [5956195,11260528,22181831,22437920,22642946,23278096,23407037,23458128, 24244657,24355363,25014714,25115774,25156886,27047688,27089078,27398716],
...}
Thank you :)
CodePudding user response:
Use dict comprehension with strip
and split
, last convert lists to integers, because empty list add if-else
statement:
d = {569328: '[ 596005 4321416 5802640 ]',
574660: '[ 5956195 ]',
1187498: []}
d = {k: list(map(int, v.strip('[]').split())) if bool(v) else [] for k, v in d.items()}
print (d)
{569328: [596005, 4321416, 5802640], 574660: [5956195], 1187498: []}
CodePudding user response:
You can use the regex expresssion for digits and then convert the list of digits as strings into a list of integers:
import re
dictionary = {569328: '[ 596005 4321416 5802640 6031690 6043910 8600475
8642629 9203255 9345445 10177065 10455451 13428248 22139349
22591458 24627241 24750476 26261826 26405611 27079105 27096884]',
574660: '[ 5956195 11260528 22181831 22437920 22642946 23278096
23407037 23458128 24244657 24355363 25014714 25115774 25156886
27047688 27089078 27398716]',
1187498: '[ 5855196 7755392 11183886 22894980 24648618 27185399]',
1226468: '[ 3573464 6279285 6294985 6542463 6981930 7427770
10325811 14970234 16878329 17935009 21811002 22329817 23543436
23907898 24456108 25283772]',
1236571: '[ 2777078 2826073 5944733 10484188 11052747 14682645
15688752 22333410 22614097 22646501 22783765 22978728 23231683
24259740 24605606 24839432 25492752 27009992 27044704]'}
for key in dictionary:
list_of_findings = list(re.findall('\d ', dictionary[key]))
dictionary[key] = list(map(int, list_of_findings))
CodePudding user response:
Another possible solution, based on the following ideas:
- Using the data as string.
- Using string manipulation and regex to remove
'
and add the missing commas. - Using
eval
to execute the manipulated string and get the result inx
.
import re
text = """
{569328: '[ 596005 4321416 5802640 6031690 6043910 8600475 8642629 9203255 9345445 10177065 10455451 13428248 22139349 22591458 24627241 24750476 26261826 26405611 27079105 27096884]',
574660: '[ 5956195 11260528 22181831 22437920 22642946 23278096 23407037 23458128 24244657 24355363 25014714 25115774 25156886 27047688 27089078 27398716]',
1187498: '[ 5855196 7755392 11183886 22894980 24648618 27185399]',
1226468: '[ 3573464 6279285 6294985 6542463 6981930 7427770 10325811 14970234 16878329 17935009 21811002 22329817 23543436 23907898 24456108 25283772]',
1236571: '[ 2777078 2826073 5944733 10484188 11052747 14682645 15688752 22333410 22614097 22646501 22783765 22978728 23231683 24259740 24605606 24839432 25492752 27009992 27044704]'}
"""
s = re.sub('(?<=\d)\s (?=\d)', ',', text.replace("'", ""))
x = eval(s)
x
Output:
{569328: [596005, 4321416, 5802640, 6031690, 6043910, 8600475, 8642629, 9203255, 9345445, 10177065, 10455451, 13428248, 22139349, 22591458, 24627241, 24750476, 26261826, 26405611, 27079105, 27096884],
574660: [5956195, 11260528, 22181831, 22437920, 22642946, 23278096, 23407037, 23458128, 24244657, 24355363, 25014714, 25115774, 25156886, 27047688, 27089078, 27398716],
1187498: [5855196, 7755392, 11183886, 22894980, 24648618, 27185399],
1226468: [3573464, 6279285, 6294985, 6542463, 6981930, 7427770, 10325811, 14970234, 16878329, 17935009, 21811002, 22329817, 23543436, 23907898, 24456108, 25283772],
1236571: [2777078, 2826073, 5944733, 10484188, 11052747, 14682645, 15688752, 22333410, 22614097, 22646501, 22783765, 22978728, 23231683, 24259740, 24605606, 24839432, 25492752, 27009992, 27044704]}