Home > Mobile >  delete text and all new line characters between 2 words in pyhton
delete text and all new line characters between 2 words in pyhton

Time:09-20

I have the following text as given

\nOUTPUTFORMAT \n  
\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nLOCATION\n  
\'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge\'\nTBLPROPERTIES (\n  
\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n  
\'spark.sql.sources.schema.numPartCols\'=\'1\', \n  \'spark.sql.sources.schema.numParts\'=\'1\'

I want to delete everything from words LOCATION till beginning of TBLPROPERTIES. I am trying to use regex, but I have been unsuccesful till now.

\nOUTPUTFORMAT \n  
\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nTBLPROPERTIES (\n  
\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n  
\'spark.sql.sources.schema.numPartCols\'=\'1\', \n  
\'spark.sql.sources.schema.numParts\'=\'1\'

Thanks in advance for your suggestions.

CodePudding user response:

import re
text = "\nOUTPUTFORMAT \n\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nLOCATION\n\'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge\'\nTBLPROPERTIES (\n\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n\'spark.sql.sources.schema.numPartCols\'=\'1\', \n\'spark.sql.sources.schema.numParts\'=\'1\'"
text = re.sub(r'LOCATION.*TBLPROPERTIES', 'TBLPROPERTIES', text, flags=re.DOTALL)
print(text)

See if this works.

  • Related