I am trying to get 1st (0) and 2nd (1) strings in a tuple at a df.
df= {'col':[ "[('affect', 'the risks')]", "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"]}
df = pd.DataFrame(df)
such that, the expected output for item0 and item1 should be:
df={'item0': [ "'affect'", "'have', 'breached','cease', 'suffer',' allow', 'damage' , 'require', 'incur', 'remediate', 'resolve'"]}
df={'item1': [ "'the risks'", "'we', 'our systems','our computer', '', 'misappropriation', 'proprietary', 'us', 'us', 'us'"]}
df = pd.DataFrame(df)
I think we should use zip() function but I couldnot figure it out because I have a dataframe here.
Resources I went through: 1)https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions 2) Python - List comprehension list of tuple list to tuple list
CodePudding user response:
If you actually have a list of tuples, like this:
data = {
"col": [
("affect", "the risks"),
("have", "we"),
("breached", "our systems"),
("cease", "our computer"),
("suffer", ""),
("allow", ""),
("damage", "misappropriation"),
("require", "proprietary"),
("incur", "us"),
("remediate", "us"),
("resolve", "us"),
],
}
Then extracting things the way you want is relatively easy:
item0 = [x[0] for x in data["col"]]
item1 = [x[1] for x in data["col"]]
print("item0:", item0)
print("item1:", item1)
That gets us:
item0: ['affect', 'have', 'breached', 'cease', 'suffer', 'allow', 'damage', 'require', 'incur', 'remediate', 'resolve']
item1: ['the risks', 'we', 'our systems', 'our computer', '', '', 'misappropriation', 'proprietary', 'us', 'us', 'us']
Unfortunately, you don't have a list of tuples, and it's not clear from your question if that's just a typo or if you have simply mis-described your data. When you write:
df = {
"col": [
"[('affect', 'the risks')]",
"[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]",
]
}
You have a list of two strings. The first is:
>>> df['col'][0]
"[('affect', 'the risks')]"
And the second is:
>>> df['col'][1]
"[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"
Processing these is going to be a little tricky. Things will be much easier if you can arrange for your data to be formatted as a list of tuples instead.