The structure of the file is like this:
Time,Weight,Change,BMI,Body Fat,Skeletal Muscle Mass,Bone Mass,Body Water,
" Oct 10, 2020",
5:15 PM,82.7 kg,1.1 kg,22.4,17.4 %,35.6 kg,5.8kg,60.3 %,
" Nov 2, 2020",
11:40 PM,84.0 kg,0.1 kg,22.8,17.5 %,35.9 kg,6.0 kg,60.3 %,
10:20 AM,83.9 kg,0.5 kg,22.8,18.1 %,35.9 kg,5.9 kg,59.8 %,
13:40 AM,84.4 kg,0.8 kg,22.9,17.3 %,36.0 kg,6.0 kg,60.4 %,
The problem is that importing the file usually doesn't recognise the date as the same line, and some dates have several records.
Is there a way to import the file somehow so that each row has the corresponding date? I suppose there should be two delimiters: start of new line and comma and it has to be combined somehow.
The usual way I tried only gives the output where each date is in a separate line.
data = pd.read_csv('test.txt', sep=',')
data.columns = ["Date","Time", "Weight", "Change", "BMI", "Body Fat", "Skeletal Muscle Mass", "Bone Mass", "Body Water"]
Output:
Date | Time | Weight | Change | BMI | Body Fat | Skeletal Muscle Mass | Bone Mass | Body Water |
---|---|---|---|---|---|---|---|---|
Sep 21, 2020 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5:46 PM | 76.5 kg | 1.1 kg | 22.4 | 17.4 % | 35.6 kg | 5.8 kg | 60.3 % | NaN |
Oct 12, 2020 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8:01 AM | 75.3 kg | 1.0 kg | 22.2 | 16.8 % | 33.2 kg | 4.9 kg | 50.8 % | NaN |
10:00 AM | 75.4 kg | 1.1 kg | 22.2 | 16.9 % | 33.2 kg | 4.9 kg | 50.8 % | NaN |
CodePudding user response:
Your file looks like not structured "well".
e.g., your date is in the one line and their corresponding time records with data are in a different line. Pandas is expecting each line is a record according to the column definition you provided.
So you need to correct the input file and append date infront of the time records and remove the date line. You may write a small script to correct that if the file is big. Afterwards, you can use pandas.
CodePudding user response:
Code:
with open('test.txt') as file:
contents = file.read() #First, Read the file
li = contents.split('\n') #Convert to the list based on line
col = list(filter(None,li[0].split(',')))
col.insert(0, 'Date') if 'Date' not in col else col #Insert Date
col
Col list look like:
['Date',
'Time',
'Weight',
'Change',
'BMI',
'Body Fat',
'Skeletal Muscle Mass',
'Bone Mass',
'Body Water']
Now getting the data
D = None #Temporary variable
data=[] #New List to story data
for i, j in enumerate(li):
if len(j) < 20:
D = j.replace('"', '') #" Oct 10, 2020"
else:
if D is not None:
data.append([D[:-1]] li[i][:-1].split(','))
data
data something look like this:
[[' Oct 10, 2020',
'5:15 PM',
'82.7 kg',
'1.1 kg',
'22.4',
'17.4 %',
'35.6 kg',
'5.8kg',
'60.3 %'],
[' Nov 2, 2020',
'11:40 PM',
'84.0 kg',
'0.1 kg',
'22.8',
'17.5 %',
'35.9 kg',
'6.0 kg',
'60.3 %'],
[' Nov 2, 2020',
'10:20 AM',
'83.9 kg',
'0.5 kg',
'22.8',
'18.1 %',
'35.9 kg',
'5.9 kg',
'59.8 %'],
[' Nov 2, 2020',
'13:40 AM',
'84.4 kg',
'0.8 kg',
'22.9',
'17.3 %',
'36.0 kg',
'6.0 kg',
'60.4 %']]
Now merging the column and data lists and converting that to the DataFrame
data.insert(0, col) if col not in data else data
pd.DataFrame(data[1:],columns=data[0])
Output:
Date Time Weight Change BMI Body Fat Skeletal Muscle Mass Bone Mass Body Water
0 Oct 10, 2020 5:15 PM 82.7 kg 1.1 kg 22.4 17.4 % 35.6 kg 5.8kg 60.3 %
1 Nov 2, 2020 11:40 PM 84.0 kg 0.1 kg 22.8 17.5 % 35.9 kg 6.0 kg 60.3 %
2 Nov 2, 2020 10:20 AM 83.9 kg 0.5 kg 22.8 18.1 % 35.9 kg 5.9 kg 59.8 %
3 Nov 2, 2020 13:40 AM 84.4 kg 0.8 kg 22.9 17.3 % 36.0 kg 6.0 kg 60.4 %