Home > Software engineering >  How to get rid of b' , all these \x00, \x** like things in bunch of strings in python 3.6?
How to get rid of b' , all these \x00, \x** like things in bunch of strings in python 3.6?

Time:03-19

I have strings as below:

content =
"b'MAJOR CONRAD A. PREEDOM\\n2354 Fairchild Dr., Suite 6H-126\\nUSAF Academy, CO 
\\xe2\\x80\\x93 Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star
 DA-40 (USAF T-52)\\n2004 \\xe2\\x80\\x93 2007\\n442 Hours/45 Flight Lead Hours in 
McDonnell Douglas F-15E Strike Eagle\\n2003 \\xe2\\x80\\x93 2004\\n19 Hours in 
Northrop AT-38B\\n2000 \\xe2\\x80\\x93 2003\\n1,311 Flight Hours/1051 Instructor 
Pilot Hours in Cessna T-37B\\n1999 \\xe2\\x80\\x93 2000\\n26 Flight Hours in Northrop
 T-38A\\n1995 PA \\xe2\\x80\\x93 1999\\nDistinguished Graduate, United States Air 
Force Academy, CO \\xe2\\x80\\x93 1998\\nOmega Rho Honor Society for Operations 
Research, United States Air Force Academy, CO \\xe2\\x80\\x93 1998\\nAIR FORCE AWARDS 
AND DECORATIONS\\nMeritorious Service Medal\\nAir Force Commendation Medal\\nAir 
Force Achievement Medal\\nAir Force Outstanding Unit Award\\nAir Force Organizational
 Excellence Award\\nCombat Readiness Medal\\nNational Defense Service Medal\\nGlobal
 War on Terrorism Service Medal\\nKorean Defense Service Medal\\nAF Longevity 
Service\\nSmall Arms Expert Marksmanship Ribbon (Pistol)\\nAF Training Ribbon'"

I want to get rid of all these b' and anything with \x with 2 trailings like \xe2, \x80 and so on. I dont know how to get rid of it. I tried

content.decode("utf-8", errors="ignore")

But because content is already str, I can't decode. So I tried below to make it like bytes and get rid of the things I want to get rid of and back to string but it does not work at all.

new_content =content.encode("ascii").decode("utf-8", errors="ignore")

when I run this code below, I can get rid of 'b and \x** things so I tried every possible thing but I do not know how to make my strings to bytes one like below. I can convert content to bytes, but it doesnt get rid of the stuff.

b'\x80abc sadad dkfbkafaf /n   \n \x80dajhbahsdsabj'.decode("utf-8", errors="ignore")

Do you have any idea how my 'content' can get rid of b' and all of \x**?

CodePudding user response:

You have a str value that contains the string representation of a bytes value, which itself is a UTF-8-encoded string. Use ast.literal_eval to get the actual bytes value, then decode it.

>>> import ast
>>> print(ast.literal_eval(content).decode())
MAJOR CONRAD A. PREEDOM
2354 Fairchild Dr., Suite 6H-126
USAF Academy, CO – Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star DA-40 (USAF T-52)
2004 – 2007
[etc]

CodePudding user response:

I got a problem. When I tried the above solution, it worked for some str values. but somehow only a few of them worked and others are raising an error which is as below.

Traceback (most recent call last):

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-36-60aff8098ca5>", line 1, in <module>
    ast.literal_eval(content)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/ast.py", line 48, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)

  File "<unknown>", line 1
    b'HUMAN RESOURCES DIRECTOR\n\xef\x82\xb7Expert in organizational \effectiveness.\n\xef\x82\xb7Recognized consensus-builder among diverse\ groups.\n\xef\x82\xb7Innovative problem solver.\n\xef\x82\xb7Strategic \partner.\n\xef\x82\xb7Effective executive \coach.\n\xef\x82\xb7Facilitator of management/staff collaboration to\ achieve business goals.\n\xef\x82\xb7Watchdog against corporate legal\ liability and exposure.\nVALUE PROPOSITION\nBringing balance and \simplicity to Human Resources processes, I create a spectrum of human\ resources support for worldwide imaging device division\nproducing \revenues approaching $1 billion. Manage all legal and compliance \issues; perform\nexecutive-level consulting in organizational \development; coaching; results-oriented training,\ndevelopment, and\ implementation; and strategic planning.\nKey \Accomplishments\n\xef\x82\xb7Improved employee satisfaction 20% by \implementing division-wide 360-degree\nfeedback process to identify and \correct problem areas.\n\xef\x82\xb7Developed and instituted innovative\ staffing plan that reduced turnaround time (from\ntime-to-\post and time-to-fill) by 25%.\n\xef\x82\xb7Developed policies that \addressed positions.\n\xef\x82\xb7Provided executive coaching to Vice\ President of Engineering to identify leaders and\nstructure the \department to improve organizational \effectiveness.\n\xef\x82\xb7Advised managers on fair hiring practices\ and employee performance issues to reduce\ncorporate \liability.\nCONTINENTAL COMPUTER CORPORATION (acquired by XCom, 1998),\ Shrewsbury, MA\n1983-1998\nSenior Human Resources Manager, Worldwide \Sales and Marketing Division Headquarters\nHeld positions of increasing\ success.\n\xef\x82\xb7Implemented an Alternative Dispute \Resolution (ADR) program with anticipated\nsavings of millions of \dollars in litigation costs.\n\xef\x82\xb7Provided leadership in \XCom/Continental acquisition by identifying acquisition , Metropolitan Mediation Services, Cambridge, MA\nMBA Executive Program, Babson College, Babson Park, MA\nBA Communications, Speech, and English, State University College of New York at Buffalo'
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ^
SyntaxError: invalid syntax

I have no clue why that is. The str with the error looks just like other str that worked fine..

  • Related