I have problem with when split list in Python-CodePudding

I have data like this :data.txt

Product list
Name                Quantity    Price           
Iphone11            5       14000000.0   
SS note 10          4       13000000.0   
Nokia C100          1       20000000.0

and this is my code to filter the

fname = open("Data.txt")
num_line = 0
for line in fname:
    num_line  = 1
    if num_line == 1 or num_line == 2:
        continue
    data = line.strip().split()
    if data == []:
        continue
    
    print(data)

And I have a result:

['Iphone11', '5', '14000000.0']
['SS', 'note', '10', '4', '13000000.0']
['Nokia', 'C100', '1', '20000000.0']

I want my code have format like it to put into my database:

['Iphone11', '5', '14000000.0']
['SS note 10', '4', '13000000.0']
['Nokia C100', '1', '20000000.0']

please help me

CodePudding user response：

You can use str.rsplit() to split from the right; and set sep=None to split base any whitespace; and set maxsplit=2 to split only two times and skip if more whitespace is found for each line. (It's better to use with like below, otherwise, you need to close the file after opening and reading.)

with open("Data.txt") as fname:
    num_line = 0
    for line in fname:
        num_line  = 1
        if num_line == 1 or num_line == 2:
            continue
            
        data = line.rsplit(maxsplit=2) # <-> line.rsplit(sep=None, maxsplit=2)
        if data == []:
            continue
        print(data)

['Iphone11', '5', '14000000.0']
['SS note 10', '4', '13000000.0']
['Nokia C100', '1', '20000000.0']

Explanation:

Signature: str.rsplit(self, /, sep=None, maxsplit=-1)

Docstring:

sep -> The delimiter according which to split the string. None (the default value) means split according to any whitespace,and discard empty strings from the result.

CodePudding user response：

If you cannot modify your data to be properly formatted with commas, then parsing by space will "break up" names with spaces in them.

So, what you can do is use a gather "*" to gather all the parts of the name and then join them back together as shown below.

(the data variable simulates reading the lines from the file)

data = ['iphone11 5 1100.0', 'nokia 5 plus 2 1220.0', 'batphone 13 extra` plus 3 2000.0']

for line in data:
    *name, qty, price = line.split()
    name = ' '.join(name)
    print (name)
    print (f'   qty: {qty}, price: {price}')

Output:

iphone11
   qty: 5, price: 1100.0
nokia 5 plus
   qty: 2, price: 1220.0
batphone 13 extra plus
   qty: 3, price: 2000.0

CodePudding user response：

Take a look at Pandas!

Pandas IO user guide.

import pandas as pd
from io import StringIO

s="""Product list
Name                Quantity    Price           
Iphone11            5       14000000.0   
SS note 10          4       13000000.0   
Nokia C100          1       20000000.0 """

widths = [20, 8, 10]
df = pd.read_fwf(StringIO(s), widths=widths, header=None, skiprows=2)

There are many options that can do almost everything!!

>>>df
            0  1           2
0    Iphone11  5  14000000.0
1  SS note 10  4  13000000.0
2  Nokia C100  1  20000000.0

If you want to capture the headers then try this:

df = pd.read_fwf(StringIO(s), widths=widths, header=0, skiprows=1)
df
         Name  Quantity       Price
0    Iphone11         5  14000000.0
1  SS note 10         4  13000000.0
2  Nokia C100         1  20000000.0

CodePudding user response：

The split method in string by default use as a split character any whitespace.

If the line read from data.txt have a different separator for instace comma (,) try this

data = line.strip().split(",")