I have a df like: df=
Sentence # Word POS Tag join
0 Sentence: 1 Thousands NNS O Thousands O
1 Sentence: 1 of IN O of O
2 Sentence: 1 demonstrators NNS O demonstrators O
3 Sentence: 2 have VBP O have O
4 Sentence: 2 marched VBN O marched O
.....
I want to write the column df['join'] in a .txt file and add an extra linespace (\n) when the value of df['Sentence #'] changes.
Like for the sample df above, the txt file will have value like:
Thousands O
of O
demonstrators O
have O
marched O...
I have tried writing the same via python but the output I'm getting is:
Thousands O
of O
demonstrators O
have O
marched O...
The code I have written:
for i, g in df.groupby('Sentence #')['join']:
out = g.append(pd.Series({'new':'/n'}))
out.to_csv('file.txt', index=False, header=None, mode='a')
Any help is appreciated. Thank you.
CodePudding user response:
import pandas as pd
#creating your dataframe example
ingest = [["Sentence: 1","Thousands", "NNS","O"],
["Sentence: 1","of", "NS","O"],
["Sentence: 1","demonstrators", "NNS","O"],
["Sentence: 2","have", "VBP","O"],
["Sentence: 2","marched", "VBN","O"],]
#defining your column example
column = ["Sentence #", "Word", "POS", "Tag"]
#creating the dataframe and joining the 2 columns to form up join
data = pd.DataFrame(ingest, columns=column)
data["join"] = data[["Word","Tag"]].apply(" ".join, axis=1)
#opening one time the file and writing into it as per your logic
with open("output.txt", "a") as file:
count = 0
for nr in range(len(data["Sentence #"])):
if count == 0:
file.write(f"{data['join'][nr]}\n")
count = 1
else:
if data["Sentence #"] [nr] == data["Sentence #"] [nr-1]:
file.write(f"{data['join'][nr]}\n")
else:
file.write(f"\n{data['join'][nr]}\n")
CodePudding user response:
I use a function like below very often during data exploration -- seems similar to what you're looking for:
print(*df
.groupby('Sentence #')['join']
.apply(lambda s: "\n".join(s.values))
.tolist(),
sep='\n\n')
I'll let you handle how to replace the print function with '\n\n'.join or something.
Also note-- try not to use variable names that are the same as built-in method names e.g. "join" in this case.