Ubuntu Command line Extract part number and quantity from converted excel file-CodePudding

I have an excel file with 40 components and I converted (online) it to txt file for doing the command line functions. I want to extract part numbers (it is 6 or 7 digits number) from it. Some follow a specific pattern. I want to extract and save it in txt file My code:

list.txt
        Product number 1  ac162049-2/slid||product|1971904|pgrid|119732683897|ptaid   1
        Product number 2, its accessories  1-82/pcrid|5194541117|pkw|product|3418376|-SHOPPING 10
        Product number 3  dip-40/dp/9761446       2

Expected output:

productnumber.txt
        1971904   
        3418376 
        9761446

My code:

grep -Po '/\K.[0-9] [1-9]' hardware\ components_prashant.txt > serialnumber.txt

Present output:

CodePudding user response：

From looking at your sample data, I believe the column delimiter is the pipe?

Assuming part number is column 1, QTY is column 8, you can do this to get it out

cat list.txt | awk -F| '{ print $1, $8 }' > quantity.txt

CodePudding user response：

Is it just any six-or-seven digits with non-alphanumerics before and behind?

grep -Eo '\b[0-9]{6,7}\b' productnumber.txt
1971904
3418376
9761446

In -Extended pattern matching, \b is a "word boundary". c.f. this tutorial. You could also use \< and \> as I did below.

[...] is a character class matching anything in the given set. a dash (-) indicates a range, so [0-9] is anything from zero to nine, inclusive. {...} specifies length limits, so {6,7} says a series of digits no less than six, and no more than seven.

If you wanted the fields you mentioned before, (...) is storage groupings, and ^ is negation in a character class, so:

sed -E 's/^ *([^0-9] [0-9] ).*\<([0-9]{6,7})\>.* ([0-9] ) *$/\1|\2|\3/' productnumber.txt
Product number 1|1971904|1
Product number 2|3418376|10
Product number 3|9761446|2