Hi I am trying to save pyspark dataframe into file but not getting the actual data. Double quotes is removing in csv file. Could you please help me to resolve this issue?
Example:
Raw_Layer:
SAPCLIENT|MATERIAL|MATERIALNAME|MATERIALTYPE|INDUSTRYSECTOR|MATERIALGROUP|MATERIALBASEUNIT|BASICMATERIAL|INDUSTRYSTANDARDNAME|LABORATORYORDESIGNOFFICE|MATERIALWEIGHTUNIT|MATERIALISBATCHMANAGED|MATERIALISMARKEDFORDELETION|MATERIALOLDID|GLOBALPRODHIERARCHY4|MATERIALEXTERNALGROUP|DIVISION|CATALOGUENUMBER|CROSSPLANTMATERIALSTATUS|MAINTENANCESTATUS|DATEOFLASTCHANGE|INTERNATIONALARTICLENUMBER|EANCATEGORY|CREATEDON|OBJECTCREATEDBY|PURCHASEORDERUNITOFMEASURE|DOCUMENTNUMBER|DOCUMENTTYPE|DOCUMENTVERSION|PAGEFORMATOFDOCUMENT|DOCUMENTCHANGENUMBER|PURCHASINGVALUEKEY|CONTAINERREQUIREMENTS|STORAGECONDITIONS|TEMP_COND_INDICATOR|TRANSPORTATIONGROUP|HAZARDOUSMATERIALNUMBER|LABELTYPE|LABELFORM|UNITINLENGTH_WIDTH_HEIGHT|QMINPROCUREMENTISACTIVE|PACKAGINGMATERIALTYPE|MATERIALCANBECOPRODUCT|EXPLICIT_SERIALNUMBER|SHELFEXPIRATIONDATE|ROUNDINGRULEFORSLED|GENERALITEMCATEGORYGROUP|EXPIRATIONDATE|COUNTRY_ORIGIN_MATERIAL|MATERIALFREIGHTGROUP|PARENTCODE|DPPRODUCTGROUP|PRODUCTCATEGORY1|PRODUCTCATEGORY2|MATERIALSUBTYPE|P3LEVEL|PRODUCTSEGMENTATION|KITINDICATOR|SPAREPARTDESCRIPTION|VARIANTFORPARENTCODE|SECTOR|MMSFINANCECLASSIFICATION|MMSTYPEOFMATERIAL|MMSTYPEOFSURGERY|MMSSELLINGPROCESS|MMSSTERILE|MMSTEMPSENSITIVE|MMSMRSLCONSIGNMENT|LEGACYSYSTEM|LEGACYMATERIALNUMBER|LEGACYDESCRIPTION|COUNTRY|TAXTYPE|TAXABLE|MMSEFFECTFROM|MMSEFFECTTO|INTERNALCHARACTERISTIC|CONFIGURATIONOBJECT|JMDNCODE|CHARACTERISTICNAME|EANUPC|INTERNATIONALARTICLENUMBEREANUPC|GLOBALMATERIALCODE|MMSLEGACYINDICATORDESC|MATERIALNETWEIGHT|MATERIALGROSSWEIGHT|UNITFORLWH|LOGSYS|GLOBALPRODHIERARCHY1|GLOBALPRODHIERARCHY1NAME|GLOBALPRODHIERARCHY2|GLOBALPRODHIERARCHY2NAME|GLOBALPRODHIERARCHY3|GLOBALPRODHIERARCHY3NAME|GLOBALPRODHIERARCHY4NAME|VOLUMEUNIT|UOM_NUMERATORBASEUNIT|UOM_DENOMINATORBASEUNIT|UOM_LENGHT|UOM_WIDTH|UOM_HEIGHT|UOM_VOLUME|UOM_VOLUMEUNIT|UOM_GROSSWEIGHT|UOM_WEIGHTUNIT|UOM_ALTSTOCKKEEPING|CHARACTERISTICS_DATE|CROSSDISTRIBUTIONSTATUS|MFRNR|MATERIALSIZEORDIMEDESC|OBJECTID|DIVISIONNAME|MATERIALNAME_F|MATERIALGROUPNAME|ALTERNATEUNITOFMEASURE|MAINEANINDICATOR|PRODUCTLICENSEAPPROVALDATE|PRODUCTIONORINSPECTIONMEMOTXT|PRODUCTLICENSEEXPIRYDATE|REGISTRATIONTODATE|REGISTRATIONVALIDFROMDATE|MEDICALDEVICECLASSIFICATION|PRODUCTLICENSENUMBER|MATERIALSPECIFICATION|SPECIFICATIONREVISIONLEVEL|IMPORTMATERIALNUMBER|VOLUME_MARA|LENGTH|WIDTH|HEIGHT|MINREMAINSHELFLIFE|TOTALSHELFLIFE|MATERIALCOUNTER
100|279729310|IEXP TEAR DROP 10MM HEX CONN\|FERT|P|42290000|EA|||DS|G|X|||055078237823040630||DS|279729310||KVDCEBGLQAS|20211210|10705034199702|IC|20161127|CR5XIP50XPU||887018654||C|||Z004||02|A|0001||||CM|X||||| |NORM|B|||279729310|SPINE|INSTRUMENT||01|XCRVAA|A|N|||MDD||||||||||||||||0000000046|000000000003755815||BFS|""|""|||28.3|28.3||P50CLNT100|0550|Orthopaedics|05507823|DePuy Spine|055078237823|DePuy Spine|Posterior Thoracolumbar|CC|1.0|1.0|0.0|0.0|0.0|0.0||0.0|G|ZNF||||L2.54CMXW2.54CMXH2.54CM|279729310|DePuy Spine||Ortho, gen surg, cas|""|""|""|16.387|2.54|2.54|2.54|0.0|0.0|""|1||||||||||0.0|||XC-RV-AA|XC-RV-AA|XC-RV-AA|ETL_USER|2021-12-22
100|279729310|"IEXP TEAR DROP 10MM HEX CONN"|FERT|P|42290000|EA|||DS|G|X|||055078237823040630||DS|279729310||KVDCEBGLQAS|20211210|10705034199702|IC|20161127|CR5XIP50XPU||887018654||C|||Z004||02|A|0001||||CM|X||||| |NORM|B|||279729310|SPINE|INSTRUMENT||01|XCRVAA|A|N|||MDD||||||||||||||||0000000046|000000000003755815||BFS|""|""|||28.3|28.3||P50CLNT100|0550|Orthopaedics|05507823|DePuy Spine|055078237823|DePuy Spine|Posterior Thoracolumbar|CC|1.0|1.0|0.0|0.0|0.0|0.0||0.0|G|ZNF||||L2.54CMXW2.54CMXH2.54CM|279729310|DePuy Spine||Ortho, gen surg, cas|""|""|""|16.387|2.54|2.54|2.54|0.0|0.0|""|1||||||||||0.0|||XC-RV-AA|XC-RV-AA|XC-RV-AA|ETL_USER|2021-12-22
But In Output:
SAPCLIENT|MATERIAL|MATERIALNAME|MATERIALTYPE|INDUSTRYSECTOR|MATERIALGROUP|MATERIALBASEUNIT|BASICMATERIAL|INDUSTRYSTANDARDNAME|LABORATORYORDESIGNOFFICE|MATERIALWEIGHTUNIT|MATERIALISBATCHMANAGED|MATERIALISMARKEDFORDELETION|MATERIALOLDID|GLOBALPRODHIERARCHY4|MATERIALEXTERNALGROUP|DIVISION|CATALOGUENUMBER|CROSSPLANTMATERIALSTATUS|MAINTENANCESTATUS|DATEOFLASTCHANGE|INTERNATIONALARTICLENUMBER|EANCATEGORY|CREATEDON|OBJECTCREATEDBY|PURCHASEORDERUNITOFMEASURE|DOCUMENTNUMBER|DOCUMENTTYPE|DOCUMENTVERSION|PAGEFORMATOFDOCUMENT|DOCUMENTCHANGENUMBER|PURCHASINGVALUEKEY|CONTAINERREQUIREMENTS|STORAGECONDITIONS|TEMP_COND_INDICATOR|TRANSPORTATIONGROUP|HAZARDOUSMATERIALNUMBER|LABELTYPE|LABELFORM|UNITINLENGTH_WIDTH_HEIGHT|QMINPROCUREMENTISACTIVE|PACKAGINGMATERIALTYPE|MATERIALCANBECOPRODUCT|EXPLICIT_SERIALNUMBER|SHELFEXPIRATIONDATE|ROUNDINGRULEFORSLED|GENERALITEMCATEGORYGROUP|EXPIRATIONDATE|COUNTRY_ORIGIN_MATERIAL|MATERIALFREIGHTGROUP|PARENTCODE|DPPRODUCTGROUP|PRODUCTCATEGORY1|PRODUCTCATEGORY2|MATERIALSUBTYPE|P3LEVEL|PRODUCTSEGMENTATION|KITINDICATOR|SPAREPARTDESCRIPTION|VARIANTFORPARENTCODE|SECTOR|MMSFINANCECLASSIFICATION|MMSTYPEOFMATERIAL|MMSTYPEOFSURGERY|MMSSELLINGPROCESS|MMSSTERILE|MMSTEMPSENSITIVE|MMSMRSLCONSIGNMENT|LEGACYSYSTEM|LEGACYMATERIALNUMBER|LEGACYDESCRIPTION|COUNTRY|TAXTYPE|TAXABLE|MMSEFFECTFROM|MMSEFFECTTO|INTERNALCHARACTERISTIC|CONFIGURATIONOBJECT|CHARACTERISTICVALUE|CHARACTERISTICNAME|GTIN_NUMBER|GTIN_CATEGORY|GLOBALMATERIALCODE|MMSLEGACYINDICATORDESC|MATERIALNETWEIGHT|MATERIALGROSSWEIGHT|UNITFORLWH|LOGSYS|GLOBALPRODHIERARCHY1|GLOBALPRODHIERARCHY1NAME|GLOBALPRODHIERARCHY2|GLOBALPRODHIERARCHY2NAME|GLOBALPRODHIERARCHY3|GLOBALPRODHIERARCHY3NAME|GLOBALPRODHIERARCHY4NAME|VOLUMEUNIT|UOM_NUMERATORBASEUNIT|UOM_DENOMINATORBASEUNIT|UOM_LENGHT|UOM_WIDTH|UOM_HEIGHT|UOM_VOLUME|UOM_VOLUMEUNIT|UOM_GROSSWEIGHT|UOM_WEIGHTUNIT|UOM_ALTSTOCKKEEPING|CHARACTERISTICS_DATE|CROSSDISTRIBUTIONSTATUS|MFRNR|MATERIALSIZEORDIMEDESC|OBJECTID|DIVISIONNAME|MATERIALNAME_F|MATERIALGROUPNAME|MEINH|MEINH_1|HPEAN|VOLUME_MARA|LENGTH|WIDTH|HEIGHT|MINREMAINSHELFLIFE|TOTALSHELFLIFE|ATFLV|MATERIALCOUNTER|JMDNCODE|EANUPC|INTARTICLENUMBEREANUPC|ALTERNATEUNITOFMEASURE|MAINEANINDICATOR|PRODUCTLICENSEAPPROVALDATE|PRODUCTIONORINSPMEMOTXT|PRODUCTLICENSEEXPIRYDATE|REGISTRATIONTODATE|REGISTRATIONVALIDFROMDATE|MEDICALDEVICECLASSIFICATION|PRODUCTLICENSENUMBER|MATERIALSPECIFICATION|SPECIFICATIONREVISIONLEVEL|IMPORTMATERIALNUMBER|updt_by|updt_ts
100|279729310|IEXP TEAR DROP 10MM HEX CONN\|FERT|P|42290000|EA|||DS|G|X|||055078237823040630||DS|279729310||KVDCEBGLQAS|20211210|10705034199702|IC|20161127|CR5XIP50XPU||887018654||C|||Z004||02|A|0001||||CM|X||||| |NORM|B|||279729310|SPINE|INSTRUMENT||01|XCRVAA|A|N|||MDD||||||||||||||||0000000046|000000000003755815||BFS|""|""|||28.3|28.3||P50CLNT100|0550|Orthopaedics|05507823|DePuy Spine|055078237823|DePuy Spine|Posterior Thoracolumbar|CC|1.0|1.0|0.0|0.0|0.0|0.0||0.0|G|ZNF||||L2.54CMXW2.54CMXH2.54CM|279729310|DePuy Spine||Ortho, gen surg, cas|""|""|""|||||||""||||||||16.387|2.54|2.54|2.54|0.0|0.0||1|1|ETL_USER|2021-12-23
100|279729310|IEXP TEAR DROP 10MM HEX CONN|FERT|P|42290000|EA|||DS|G|X|||055078237823040630||DS|279729310||KVDCEBGLQAS|20211210|10705034199702|IC|20161127|CR5XIP50XPU||887018654||C|||Z004||02|A|0001||||CM|X||||| |NORM|B|||279729310|SPINE|INSTRUMENT||01|XCRVAA|A|N|||MDD||||||||||||||||0000000046|000000000003755815||BFS|""|""|||28.3|28.3||P50CLNT100|0550|Orthopaedics|05507823|DePuy Spine|055078237823|DePuy Spine|Posterior Thoracolumbar|CC|1.0|1.0|0.0|0.0|0.0|0.0||0.0|G|ZNF||||L2.54CMXW2.54CMXH2.54CM|279729310|DePuy Spine||Ortho, gen surg, cas|""|""|""|||||||""||||||||16.387|2.54|2.54|2.54|0.0|0.0||1|1|ETL_USER|2021-12-23
I am using below code-
df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path,quote='',escape='\"', sep='|', header='True', nullValue=None)
CodePudding user response:
You confused the escape
argument with the quote
argument:
df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, quote='\"', sep='|', header='True', nullValue=None)
This should work. Convenience link: https://spark.rstudio.com/reference/spark_write_csv.html