I have the following text as given
\nOUTPUTFORMAT \n
\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nLOCATION\n
\'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge\'\nTBLPROPERTIES (\n
\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n
\'spark.sql.sources.schema.numPartCols\'=\'1\', \n \'spark.sql.sources.schema.numParts\'=\'1\'
I want to delete everything from words LOCATION till beginning of TBLPROPERTIES. I am trying to use regex, but I have been unsuccesful till now.
\nOUTPUTFORMAT \n
\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nTBLPROPERTIES (\n
\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n
\'spark.sql.sources.schema.numPartCols\'=\'1\', \n
\'spark.sql.sources.schema.numParts\'=\'1\'
Thanks in advance for your suggestions.
CodePudding user response:
import re
text = "\nOUTPUTFORMAT \n\'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nLOCATION\n\'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge\'\nTBLPROPERTIES (\n\'spark.sql.create.version\'=\'2.4.0-cdh6.3.2\', \n\'spark.sql.sources.schema.numPartCols\'=\'1\', \n\'spark.sql.sources.schema.numParts\'=\'1\'"
text = re.sub(r'LOCATION.*TBLPROPERTIES', 'TBLPROPERTIES', text, flags=re.DOTALL)
print(text)
See if this works.