how get the nth string in a list of tuples through list-comprehension?-CodePudding

I am trying to get 1st (0) and 2nd (1) strings in a tuple at a df.

df= {'col':[ "[('affect', 'the risks')]", "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"]}
df = pd.DataFrame(df)

such that, the expected output for item0 and item1 should be:

df={'item0': [ "'affect'",  "'have', 'breached','cease', 'suffer',' allow', 'damage' , 'require', 'incur', 'remediate', 'resolve'"]}

df={'item1': [ "'the risks'",  "'we', 'our systems','our computer', '', 'misappropriation', 'proprietary', 'us', 'us', 'us'"]}
df = pd.DataFrame(df)

I think we should use zip() function but I couldnot figure it out because I have a dataframe here.

Resources I went through: 1)https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions 2) Python - List comprehension list of tuple list to tuple list

CodePudding user response：

If you actually have a list of tuples, like this:

data = {
    "col": [
        ("affect", "the risks"),
        ("have", "we"),
        ("breached", "our systems"),
        ("cease", "our computer"),
        ("suffer", ""),
        ("allow", ""),
        ("damage", "misappropriation"),
        ("require", "proprietary"),
        ("incur", "us"),
        ("remediate", "us"),
        ("resolve", "us"),
    ],
}

Then extracting things the way you want is relatively easy:

item0 = [x[0] for x in data["col"]]
item1 = [x[1] for x in data["col"]]

print("item0:", item0)
print("item1:", item1)

That gets us:

item0: ['affect', 'have', 'breached', 'cease', 'suffer', 'allow', 'damage', 'require', 'incur', 'remediate', 'resolve']
item1: ['the risks', 'we', 'our systems', 'our computer', '', '', 'misappropriation', 'proprietary', 'us', 'us', 'us']

Unfortunately, you don't have a list of tuples, and it's not clear from your question if that's just a typo or if you have simply mis-described your data. When you write:

df = {
    "col": [
        "[('affect', 'the risks')]",
        "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]",
    ]
}

You have a list of two strings. The first is:

>>> df['col'][0]
"[('affect', 'the risks')]"

And the second is:

>>> df['col'][1]
"[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"

Processing these is going to be a little tricky. Things will be much easier if you can arrange for your data to be formatted as a list of tuples instead.