I have create a sample data frame and it contains a column Called 'Body' and the content of it as below.
sample['Body'][1]
'['Former India captains should have shown the maturity to sort out the matter privately', 'When egos clash, the results are often disastrous. Ugly too. And the row tends to rumble on. That’s what has happened in the Virat Kohli-Sourav Ganguly spat. The fallout from the controversy continues to hog headlines with the news that Ganguly had planned to issue a show-cause notice to Kohli over his statements.', 'Ahead of the South Africa tour, at an online press conference, Kohli denied Ganguly’s claims that he had tried to dissuade the former from quitting as T20 captain. That came on the heels of the Indian cricket board’s move to strip Kohli of captaincy in One-Day Internationals.', 'Kohli’s denial lit an inferno as Ganguly was deemed to have lied. While the Indian cricket board president didn’t clear the air, chief selector Chetan Sharma sprang to Ganguly’s defence, saying that everyone at the selection committee meeting had asked Kohli to reconsider his decision. That runs counter to Kohli’s assertion that the Board for Control of Cricket in India (BCCI) received the news well and even called it a “progressive step”.', 'Somebody is not telling the truth. But there’s no point in dredging the matter since Kohli has relinquished the Indian captaincy in all formats. Amid the ruins of the Test series loss to South Africa comes the news that Ganguly had to be persuaded against issuing a show-cause notice to Kohli, reports said. Fortunately for Indian cricket, better sense prevailed.', 'Yet I wonder why two individuals in responsible positions didn’t show the maturity to settle the matter amicably. If Ganguly had indeed spoken to Kohli on T20 captaincy, there was no point going public since it wouldn’t have prompted Kohli to reverse his decision.', 'Similarly, Kohli shouldn’t have tarred Ganguly publicly, even if the Board president didn’t discuss the T20 captaincy. He should have spoken to Ganguly privately, and the matter would have rested there.', 'Instead, the former India captains chose to go at each other in full public view. It’s nothing but an ego clash. And the fallout was certainly undesirable. It undoubtedly drove Kohli to relinquish the Test captaincy, which to me, was entirely avoidable.', 'In the end, it looked as if Kohli was pressured into giving up the leadership role. If that’s true, it’s absolutely reprehensible. That’s no way to treat a skipper who helped transform India into the dominant force in cricket for the last few years.', 'Well, nobody is irreplaceable. Ganguly would know that since he was unceremoniously dumped as captain after he had helped turn India into world-beaters. Having been at the receiving end of an unsavoury saga, Ganguly shouldn’t have allowed a similar fate to befall on Kohli, who was told of his ouster as ODI captain only 40 minutes before the selection meeting.', 'But then, that’s politics. Cricket could do well without that. I hope we have heard the last of the Kohli-Ganguly spat. It benefits no one. And Indian cricket would be the loser.', '@ShyamKris_', 'Shyam A. Krishna is Senior Associate Editor at Gulf News. He writes on health and sport.', '', 'GetBreaking NewsAlerts From Gulf News', 'We’ll send you latest news updates through the day. You can manage them any time by clicking on the notification icon.', 'Dear Reader,', 'This section is aboutLiving in UAEand essential information you cannot live without.', 'Register to read and get full access to gulfnews.com', "By clicking below to sign up, you're agreeing to ourTerms of UseandPrivacy Policy", 'Forgot password', 'or']'
I want to remove the list format and convert the column to a plain text for preprocessing (remove commas between setences ,square brackets and just make a plain news text). I am using below code, but still its gives me the output in a list format. I am confused whats the wrong.
print(''.join(sample.Body[1]))
['Former India captains should have shown the maturity to sort out the matter privately', 'When egos clash, the results are often disastrous. Ugly too. And the row tends to rumble on. That’s what has happened in the Virat Kohli-Sourav Ganguly spat. The fallout from the controversy continues to hog headlines with the news that Ganguly had planned to issue a show-cause notice to Kohli over his statements.', 'Ahead of the South Africa tour, at an online press conference, Kohli denied Ganguly’s claims that he had tried to dissuade the former from quitting as T20 captain. That came on the heels of the Indian cricket board’s move to strip Kohli of captaincy in One-Day Internationals.', 'Kohli’s denial lit an inferno as Ganguly was deemed to have lied. While the Indian cricket board president didn’t clear the air, chief selector Chetan Sharma sprang to Ganguly’s defence, saying that everyone at the selection committee meeting had asked Kohli to reconsider his decision. That runs counter to Kohli’s assertion that the Board for Control of Cricket in India (BCCI) received the news well and even called it a “progressive step”.', 'Somebody is not telling the truth. But there’s no point in dredging the matter since Kohli has relinquished the Indian captaincy in all formats. Amid the ruins of the Test series loss to South Africa comes the news that Ganguly had to be persuaded against issuing a show-cause notice to Kohli, reports said. Fortunately for Indian cricket, better sense prevailed.', 'Yet I wonder why two individuals in responsible positions didn’t show the maturity to settle the matter amicably. If Ganguly had indeed spoken to Kohli on T20 captaincy, there was no point going public since it wouldn’t have prompted Kohli to reverse his decision.', 'Similarly, Kohli shouldn’t have tarred Ganguly publicly, even if the Board president didn’t discuss the T20 captaincy. He should have spoken to Ganguly privately, and the matter would have rested there.', 'Instead, the former India captains chose to go at each other in full public view. It’s nothing but an ego clash. And the fallout was certainly undesirable. It undoubtedly drove Kohli to relinquish the Test captaincy, which to me, was entirely avoidable.', 'In the end, it looked as if Kohli was pressured into giving up the leadership role. If that’s true, it’s absolutely reprehensible. That’s no way to treat a skipper who helped transform India into the dominant force in cricket for the last few years.', 'Well, nobody is irreplaceable. Ganguly would know that since he was unceremoniously dumped as captain after he had helped turn India into world-beaters. Having been at the receiving end of an unsavoury saga, Ganguly shouldn’t have allowed a similar fate to befall on Kohli, who was told of his ouster as ODI captain only 40 minutes before the selection meeting.', 'But then, that’s politics. Cricket could do well without that. I hope we have heard the last of the Kohli-Ganguly spat. It benefits no one. And Indian cricket would be the loser.', '@ShyamKris_', 'Shyam A. Krishna is Senior Associate Editor at Gulf News. He writes on health and sport.', '', 'GetBreaking NewsAlerts From Gulf News', 'We’ll send you latest news updates through the day. You can manage them any time by clicking on the notification icon.', 'Dear Reader,', 'This section is aboutLiving in UAEand essential information you cannot live without.', 'Register to read and get full access to gulfnews.com', "By clicking below to sign up, you're agreeing to ourTerms of UseandPrivacy Policy", 'Forgot password', 'or']
CodePudding user response:
I am not sure what is the type of the object.
If you can change it to string, you can try this simple solution.
test_string = """'['Former India captains should have shown the maturity to sort out the matter privately', 'When egos clash, the results are often disastrous. Ugly too. And the row tends to rumble on. That’s what has happened in the Virat Kohli-Sourav Ganguly spat. The fallout from the controversy continues to hog headlines with the news that Ganguly had planned to issue a show-cause notice to Kohli over his statements.', 'Ahead of the South Africa tour, at an online press conference, Kohli denied Ganguly’s claims that he had tried to dissuade the former from quitting as T20 captain. That came on the heels of the Indian cricket board’s move to strip Kohli of captaincy in One-Day Internationals.', 'Kohli’s denial lit an inferno as Ganguly was deemed to have lied. While the Indian cricket board president didn’t clear the air, chief selector Chetan Sharma sprang to Ganguly’s defence, saying that everyone at the selection committee meeting had asked Kohli to reconsider his decision. That runs counter to Kohli’s assertion that the Board for Control of Cricket in India (BCCI) received the news well and even called it a “progressive step”.', 'Somebody is not telling the truth. But there’s no point in dredging the matter since Kohli has relinquished the Indian captaincy in all formats. Amid the ruins of the Test series loss to South Africa comes the news that Ganguly had to be persuaded against issuing a show-cause notice to Kohli, reports said. Fortunately for Indian cricket, better sense prevailed.', 'Yet I wonder why two individuals in responsible positions didn’t show the maturity to settle the matter amicably. If Ganguly had indeed spoken to Kohli on T20 captaincy, there was no point going public since it wouldn’t have prompted Kohli to reverse his decision.', 'Similarly, Kohli shouldn’t have tarred Ganguly publicly, even if the Board president didn’t discuss the T20 captaincy. He should have spoken to Ganguly privately, and the matter would have rested there.', 'Instead, the former India captains chose to go at each other in full public view. It’s nothing but an ego clash. And the fallout was certainly undesirable. It undoubtedly drove Kohli to relinquish the Test captaincy, which to me, was entirely avoidable.', 'In the end, it looked as if Kohli was pressured into giving up the leadership role. If that’s true, it’s absolutely reprehensible. That’s no way to treat a skipper who helped transform India into the dominant force in cricket for the last few years.', 'Well, nobody is irreplaceable. Ganguly would know that since he was unceremoniously dumped as captain after he had helped turn India into world-beaters. Having been at the receiving end of an unsavoury saga, Ganguly shouldn’t have allowed a similar fate to befall on Kohli, who was told of his ouster as ODI captain only 40 minutes before the selection meeting.', 'But then, that’s politics. Cricket could do well without that. I hope we have heard the last of the Kohli-Ganguly spat. It benefits no one. And Indian cricket would be the loser.', '@ShyamKris_', 'Shyam A. Krishna is Senior Associate Editor at Gulf News. He writes on health and sport.', '', 'GetBreaking NewsAlerts From Gulf News', 'We’ll send you latest news updates through the day. You can manage them any time by clicking on the notification icon.', 'Dear Reader,', 'This section is aboutLiving in UAEand essential information you cannot live without.', 'Register to read and get full access to gulfnews.com', "By clicking below to sign up, you're agreeing to ourTerms of UseandPrivacy Policy", 'Forgot password', 'or']'"""
print(test_string)
print("After Edit:")
print(test_string.replace(",","").replace("'[","").replace("]'","").replace("'' ",""))
I removed the extra single quote on the outside and some empty sentence outside of your requirement. you can add back it in if you want by simply modifying the replace.
CodePudding user response:
Use .str.join
method of pandas.Series
consider following example
import pandas as pd
df = pd.DataFrame({'Body':[["Hello","World"],["This is","important","news"],["List with single element"]]})
df['BodyString'] = df.Body.str.join(" ")
print(df)
gives output
Body BodyString
0 [Hello, World] Hello World
1 [This is, important, news] This is important news
2 [List with single element] List with single element