Home > Net >  issue with python and whatsapp
issue with python and whatsapp

Time:10-22

I extracted the data from whatsapp into a txt file I need to create 4 columns Date, Time, Name and Message in my output file

import pandas as pd

# read file by lines
with open('D:\Analysis\example_chat_whatsapp.txt', encoding="utf-8") as f: 
     data=f.readlines()


# # sanity stats
print('num lines: %s' %(len(data)))

# parse text and create list of lists structure
# remove first whatsapp info message
dataset = data[1:]
cleaned_data = []
for line in dataset:
# grab the info and cut it out
    date = line.split(" ")[0]
    line2 = line[len(date):]
    time = line2.split(" ")[0][:2]
    line3 = line2[len(time):]
    name = line3.split(":")[0][:4]
    line4 = line3[len(name):]
    message = line4[6:-1] # strip newline charactor

    #print(date, time, name, message)
cleaned_data.append([date, time, name, message])

  
#Create the DataFrame 
df = pd.DataFrame(cleaned_data, columns = ['Date', 'Time', 'Name', 'Message']) 
df

The issue that I am getting is with variable Time (empty) and Name with a wrong output. Date and Message are Ok with expected output

CodePudding user response:

If uncommented print(date, time, name, message) prints valid data, then just add 4 spaces before cleaned_data.append([date, time, name, message]).

for line in dataset:
    # grab the info and cut it out
    date = line.split(" ")[0]
    line2 = line[len(date)   1:]
    time = line2.split(" ")[0]
    line3 = line2[len(time):]
    name = line3.split(":")[0]
    line4 = line3[len(name):]
    message = line4
    
    row = (date[1:], time[:-1], name[1:], message[2:-1])
    # print("'%s', '%s', '%s', '%s'" % row)
    cleaned_data.append(row)

s[1:] returns s with first character removed, s[:-1] returns s with last character removed, and so on.

CodePudding user response:

dataset = data[1:]
cleaned_data = []
for line in dataset:
    # grab the info and cut it out
    date = line.split(" ")[0]
    line2 = line[len(date)   1:]
    time = line2.split(" ")[0]
    line3 = line2[len(time):]
    name = line3.split(":")[0]
    line4 = line3[len(name):]
    message = line4
    
    row = (date[1:], time[:-1], name[1:], message[2:-1])
    # print("'%s', '%s', '%s', '%s'" % row)
cleaned_data.append(row)
  
df=pd.DataFrame(cleaned_data, columns = ['date', 'time', 'name', 'message']) 
df.count()

date       1
time       1
name       1
message    1
dtype: int64

Do you know why is taking only one row and is not appending remainings??

  • Related