adding extra linespace when there is change in a column value while writing a df to a txt file-CodePudding

I have a df like: df=

    Sentence #    Word        POS   Tag      join
0   Sentence: 1   Thousands  NNS    O      Thousands O
1   Sentence: 1      of       IN    O       of O
2   Sentence: 1 demonstrators  NNS  O      demonstrators O
3   Sentence: 2    have       VBP   O      have O
4   Sentence: 2  marched      VBN   O      marched O
.....

I want to write the column df['join'] in a .txt file and add an extra linespace (\n) when the value of df['Sentence #'] changes.

Like for the sample df above, the txt file will have value like:

Thousands O
of O
demonstrators O

have O
marched O...

I have tried writing the same via python but the output I'm getting is:

Thousands O
of O
demonstrators O
have O
marched O...

The code I have written:

for i, g in df.groupby('Sentence #')['join']:
    out = g.append(pd.Series({'new':'/n'}))
    out.to_csv('file.txt', index=False, header=None, mode='a')

Any help is appreciated. Thank you.

CodePudding user response：

import pandas as pd

#creating your dataframe example
ingest = [["Sentence: 1","Thousands", "NNS","O"],
      ["Sentence: 1","of", "NS","O"],
      ["Sentence: 1","demonstrators", "NNS","O"],
      ["Sentence: 2","have", "VBP","O"],
      ["Sentence: 2","marched", "VBN","O"],]

#defining your column example
column = ["Sentence #", "Word", "POS", "Tag"]

#creating the dataframe and joining the 2 columns to form up join
data = pd.DataFrame(ingest, columns=column)
data["join"] = data[["Word","Tag"]].apply(" ".join, axis=1)

#opening one time the file and writing into it as per your logic
with open("output.txt", "a") as file:
    count = 0
    for nr in range(len(data["Sentence #"])):
        if count == 0:
            file.write(f"{data['join'][nr]}\n")
            count = 1
        else:
            if data["Sentence #"] [nr] == data["Sentence #"] [nr-1]:
                file.write(f"{data['join'][nr]}\n")
            else:
                file.write(f"\n{data['join'][nr]}\n")

CodePudding user response：

I use a function like below very often during data exploration -- seems similar to what you're looking for:

print(*df
  .groupby('Sentence #')['join']
  .apply(lambda s: "\n".join(s.values))
  .tolist(), 
  sep='\n\n')

I'll let you handle how to replace the print function with '\n\n'.join or something.

Also note-- try not to use variable names that are the same as built-in method names e.g. "join" in this case.