txt to csv with python pandas-CodePudding

I have a txt file that I would like to read and export in csv format, but i have one problem.

Data in txt file looks like this |

Sales Organization|Distribution Channel|Sold-To #|Sold-To Name  |Ship-To #|Ship-To Name             |Mark-For #|Mark-For Name|Z1 : Sales Rep|Z1 : Sales Rep (Name)|Order Number|Sales Doc Type|Order Reason|PO Number|PO Type|Header Department|Delivery Block (H)|Billing Block (H)|Doc Date  |RDD (H)   |Cancel Date (H)|RDD (L)   |Cancel date (L)|Division|Plant|Material  |Sales Doc Item|Size  |Schedule Line|Size Confirm Date|Item Category|Rej.Reason (SL)|Order Qty (SL)|Confirmed Qty (SL)|Unconfirmed Qty (SL)|Cancelled Qty (SL)|Open Qty (SL)|Reserved Qty (SL)|Fixed Qty (SL)|% Allocation (SL)|Delivered Qty (SL)|PGI Qty (SL)|Invoiced Qty (SL)|Net Unit Price|Confirmed Net Value (SL)|Dollars Shipped (SL)|Currency|% Shipped/Allocated (SL)|Delivery Block (SL)|Sales UOM|Credit Limit Status Text              |EAN/UPC      |Customer Material|
|   EU01              |10                  |10026276 | EU SARL|20056417 |Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|36  32|            1|14.02.2022       |ZTAN         |               |        1,000 |            1,000 |              0,000 |            0,000 |       0,000 |           0,000 |        0,000 |           0,000 |            1,000 |      0,000 |           0,000 |       41,600 |                 41,600 |              0,000 |EUR     |                100,000 |                   |EA       |Credit check was executed, document OK|5400898540995|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|33  34|            2|14.02.2022       |ZTAN         |               |        1,000 |            1,000 |              0,000 |            0,000 |       0,000 |           0,000 |        0,000 |           0,000 |            1,000 |      0,000 |           0,000 |       41,600 |                 41,600 |              0,000 |EUR     |                100,000 |                   |EA       |Credit check was executed, document OK|5400898540926|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |35524-0004|           410|32  32|            3|14.02.2022       |ZTAN         |P6             |        2,000 |            0,000 |              0,000 |            2,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       41,600 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400898508124|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |              |                     |1805338693  |ZOR           |ZST         |86LRD5JM |EDI    |                 |                  |                 |14.02.2022|14.02.2022|03.03.2022     |14.02.2022|03.03.2022     |20      |3045 |85862-0041|           530|29  - |            1|14.02.2022       |ZTAN         |P6             |        1,000 |            0,000 |              0,000 |            1,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       21,100 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970111273|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith|1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |00501-3199|            10|36  34|            1|14.02.2022       |ZTAN         |X9             |       17,000 |            0,000 |              0,000 |           17,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       47,800 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970332180|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-4432|            20|40  32|            1|14.02.2022       |ZTAN         |J2             |        2,000 |            0,000 |              0,000 |            2,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       41,300 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400898076951|                 |
|   EU01              |10                  |10026276 |EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-5115|            30|36  32|            1|14.02.2022       |ZTAN         |P6             |        5,000 |            0,000 |              0,000 |            5,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       47,800 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970262012|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646         |John, Smith   |1805339436  |ZOR           |ZST         |4QRNXHPH |EDI    |                 |                  |                 |14.02.2022|14.02.2022|04.03.2022     |14.02.2022|04.03.2022     |10      |3045 |04511-5155|            40|28  30|            1|14.02.2022       |ZTAN         |X9             |        1,000 |            0,000 |              0,000 |            1,000 |       0,000 |           0,000 |        0,000 |           0,000 |            0,000 |      0,000 |           0,000 |       56,500 |                  0,000 |              0,000 |EUR     |                  0,000 |                   |EA       |Credit check was executed, document OK|5400970254963|                 |
|   EU01              |10                  |10026276 | EU SARL|20056417 | Fulfillmemt Poland|          |             |72646

and I would like to show each columns separately in csv file. Now as you see columns are separated by | . As an example - Sales Organization should be a header and EU01 should be its value and so on.

df =pd.read_csv('1.txt', sep='delimiter', header= None, engine='python')
df =df.iloc[3:]

df.to_csv(path   '123.csv', index=False, header=True)

CodePudding user response：

Well, Heres what i did:

Let's call this file, data.txt. I add an additional '|' Before Sales Organisation

Now I strip white space off it:

with open('data.txt', 'r') as f:
    lines = f.readlines()

Stripped = [line.replace(' ', '') for line in lines]

with open('data.txt', 'w') as f:
    f.writelines(Stripped)

we then get a clean-looking data.txt:

|SalesOrganization|DistributionChannel|Sold-To|Sold-ToName|Ship-To|Ship-ToName|Mark-For|Mark-ForName|Z1:SalesRep|Z1:SalesRep(Name)|OrderNumber|SalesDocType|OrderReason|PONumber|POType|HeaderDepartment|DeliveryBlock(H)|BillingBlock(H)|DocDate|RDD(H)|CancelDate(H)|RDD(L)|Canceldate(L)|Division|Plant|Material|SalesDocItem|Size|ScheduleLine|SizeConfirmDate|ItemCategory|Rej.Reason(SL)|OrderQty(SL)|ConfirmedQty(SL)|UnconfirmedQty(SL)|CancelledQty(SL)|OpenQty(SL)|ReservedQty(SL)|FixedQty(SL)|%Allocation(SL)|DeliveredQty(SL)|PGIQty(SL)|InvoicedQty(SL)|NetUnitPrice|ConfirmedNetValue(SL)|DollarsShipped(SL)|Currency|%Shipped/Allocated(SL)|DeliveryBlock(SL)|SalesUOM|CreditLimitStatusText|EAN/UPC|CustomerMaterial|
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3632|1|14.02.2022|ZTAN||1,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|1,000|0,000|0,000|41,600|41,600|0,000|EUR|100,000||EA|Creditcheckwasexecuted,documentOK|5400898540995||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3334|2|14.02.2022|ZTAN||1,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|1,000|0,000|0,000|41,600|41,600|0,000|EUR|100,000||EA|Creditcheckwasexecuted,documentOK|5400898540926||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|35524-0004|410|3232|3|14.02.2022|ZTAN|P6|2,000|0,000|0,000|2,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|41,600|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400898508124||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||||1805338693|ZOR|ZST|86LRD5JM|EDI||||14.02.2022|14.02.2022|03.03.2022|14.02.2022|03.03.2022|20|3045|85862-0041|530|29-|1|14.02.2022|ZTAN|P6|1,000|0,000|0,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|21,100|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970111273||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|00501-3199|10|3634|1|14.02.2022|ZTAN|X9|17,000|0,000|0,000|17,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|47,800|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970332180||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-4432|20|4032|1|14.02.2022|ZTAN|J2|2,000|0,000|0,000|2,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|41,300|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400898076951||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-5115|30|3632|1|14.02.2022|ZTAN|P6|5,000|0,000|0,000|5,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|47,800|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970262012||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646|John,Smith|1805339436|ZOR|ZST|4QRNXHPH|EDI||||14.02.2022|14.02.2022|04.03.2022|14.02.2022|04.03.2022|10|3045|04511-5155|40|2830|1|14.02.2022|ZTAN|X9|1,000|0,000|0,000|1,000|0,000|0,000|0,000|0,000|0,000|0,000|0,000|56,500|0,000|0,000|EUR|0,000||EA|Creditcheckwasexecuted,documentOK|5400970254963||
|EU01|10|10026276|EUSARL|20056417|FulfillmemtPoland|||72646

Now I simply read it into pandas and drop the first column:

df = pd.read_csv('data.txt', sep='|'  , engine='python')
df = df.drop(['Unnamed: 0'], axis = 1)

Heres the output!

  SalesOrganization  DistributionChannel  ...  CustomerMaterial Unnamed: 54
0              EU01                   10  ...               NaN         NaN
1              EU01                   10  ...               NaN         NaN
2              EU01                   10  ...               NaN         NaN
3              EU01                   10  ...               NaN         NaN
4              EU01                   10  ...               NaN         NaN
5              EU01                   10  ...               NaN         NaN
6              EU01                   10  ...               NaN         NaN
7              EU01                   10  ...               NaN         NaN
8              EU01                   10  ...               NaN         NaN

You can now convert it to a CSV, if you like:

df.to_csv('data.csv', index=False)

CodePudding user response：

In your provided sample txt file. You have an | at the start and end of every row. So you need remove that before you read csv. Otherwise, it will give you ParseError. Once this is fixed, you can use sep='|' Like this:

df =pd.read_csv('1.txt', sep='|', header= None)