Home > Back-end >  What is the best way to split a column of data in a txt file based on the first delimiter?
What is the best way to split a column of data in a txt file based on the first delimiter?

Time:08-27

I have a large number of txt files with a single column of data. There are no headers in the files.

The data are an email address followed by a : and then a string of varchar, which sometimes includes :s.

My goal is to convert the following

[email protected]:v@rch:r$tR:ng
[email protected]::multipleTypes
[email protected]:&ofTxtGoAfT3rThe:

To a tsv with headers.

column1           column2
[email protected]   v@rch:r$tR:ng
[email protected]   :multipleTypes
[email protected]   &ofTxtGoAfT3rThe:

These files will then be uploaded into a postgres database.

Any insight/advice is greatly appreciated.

CodePudding user response:

sed can do that. Without /g, its substitution replaces the first occurrence on each line:

sed -e 's/:/\t/' *.txt > file.tsv

CodePudding user response:

In postgres you can split the text at the appropriate position and export the hole with COPY to an TSV

this would look like

CREATE TABLE mytab(mytext text)
INSERT INTO mytab VALUES('[email protected]:v@rch:r$tR:ng'),('[email protected]::multipleTypes'),('[email protected]:&ofTxtGoAfT3rThe:')
SELECT 'column1', 'column2'
UNION ALL
SELECT  substr(mytext,1, strpos(mytext, ':') -1) col1,
substr(mytext,strpos(mytext, ':') 1) col2 FROM mytab
?column?        | ?column?         
:-------------- | :----------------
column1         | column2          
[email protected] | v@rch:r$tR:ng    
[email protected] | :multipleTypes   
[email protected] | &ofTxtGoAfT3rThe:
COPY (SELECT 'column1', 'column2'
UNION ALL
SELECT  substr(mytext,1, strpos(mytext, ':') -1) col1,
substr(mytext,strpos(mytext, ':') 1) col2 FROM mytab) TO '/tmp/ind.tsv'  CSV  HEADER DELIMITER E'\t';
SELECT  substr('[email protected]::multipleTypes',1, strpos('[email protected]::multipleTypes', ':') -1) col1,
substr('[email protected]::multipleTypes',strpos('[email protected]::multipleTypes', ':') 1) col2
col1            | col2          
:-------------- | :-------------
[email protected] | :multipleTypes
SELECT  substr('[email protected]:&ofTxtGoAfT3rThe:',1, strpos('[email protected]:&ofTxtGoAfT3rThe:', ':') -1) col1,
substr('[email protected]:&ofTxtGoAfT3rThe:',strpos('[email protected]:&ofTxtGoAfT3rThe:', ':') 1) col2
col1            | col2             
:-------------- | :----------------
[email protected] | &ofTxtGoAfT3rThe:

db<>fiddle here

CodePudding user response:

@choroba is correct with Sed. This can also be done in python.

x = '[email protected]::multipleTypes'

x = x.split(':', 1)

print(x)

This splits the text at the first :

  • Related