How do I combine lines in a text file in a specific order?-CodePudding

I'm trying to transform the text in a file according the following rule: for each line, if the line does not begin with "https", add that word to the beginning of subsequent lines until you hit another line with a non-https word.

For example, given this file:

Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//

I want

Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//

Here is my attempt:

one = open("links.txt", "r")
for two in one.readlines():

    if "https" not in two:
        sitex = two
        
    else:
        print (sitex   "-"  two)

Here is the output of that program, using the above sample input file:

Fruit
-https://www.apple.com//

Fruit
-https://www.banana.com//       

Vegetable
-https://www.cucumber.com//     

Vegetable
-https://www.lettuce.com//

What is wrong with my code?

CodePudding user response：

To fix that we need to implement rstrip() method to sitex to remove the new line character at the end of the string. (credit to BrokenBenchmark)

second, the print command by default newlines everytime it's called, so we must add the end="" parameter to fix this.

So your code should look like this

one = open("links.txt", "r")
for two in one.readlines():
    if "https" not in two:
        sitex = two.rstrip()
    else:
        print (sitex   "-"  two,end="")
one.close()

Also always close the file when you are done.

CodePudding user response：

Lines in your file end on "\n" - the newline character.

You can remove whitespaces (including "\n" from a string using strip() (both ends) or rstrip() / lstrip() (remove at one end).

print() adds a "\n" at its end by default, you can omit this using

print("something", end=" ")
print("more)   # ==> some thingmore in one line

Fix for your code:

# use a context handler for better file handling
with open("data.txt","w") as f:
    f.write("""Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
""")


with open("data.txt") as f:
    what = ""
    # iterate file line by line instead of reading all at once
    for line in f:
        # remove whitespace from current line, including \n
        # front AND back - you could use rstring here as well
        line = line.strip() 
        # only do something for non-empty lines (your file does not
        # contain empty lines, but the last line may be empty
        if line:
            # easier to understand condition without negation
            if line.startswith("http"):
                # printing adds a \n at the end
                print(f"{what}-{line}") # line & what are stripped
            else:
                what = line

Output:

Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//

See:

[chars] are optional - if not given, whitespaces are removed.

CodePudding user response：

You need to strip the trailing newline from the line if it doesn't contain 'https':

sitex = two

should be

sitex = two.rstrip()

You need to do something similar for the else block as well, as ShadowRanger points out:

print (sitex   "-"  two)

should be

print (sitex   "-"   two.rstrip())