Consider a dataframe from txt file.
Data txt file: txt
I removed the unwanted lines that contains "Iteration"
import pandas as pd
data = pd.read_csv('results.txt', sep=" ", header=None)
data = data[~data[1].astype(str).str.startswith('Iteration')]
Now its like this: pd
I wanted to remove all the columns that contains ":" to result something like: {2, 7}{2, 7}{2, 7}{2, 7}{1, 2, 7}{1, 2, 6, 7} {1, 2, 6, 7}{1, 2, 3, 6, 7}{1, 2, 3, 6, 7} {1, 3, 4, 6, 7}{1, 3, 4, 5, 6} {3, 4, 5, 6, 7}{1, 3, 4, 5, 7} {1, 4, 5, 6, 7}, .......etc.
and to merge columns such as: "{2," " 7}," into one column "{2,7}" Then I can replace all the column names with timestamps (6:30, 7:00 etc) to make it a proper table like :
6:30 7:00 7:30 8:00 8:30 9:00
{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}
{5,6} {5,6} {5,6} {5,6} {1,5,6} {1,4,5,6}
....
My first step now is to remove columns with ":" , I tried these:
data.loc[:, ~(data[0:122].astype(str).apply(lambda x: x.str.contains(':'))).any()]
data.loc[:, ~(data.contains(':')).any()]
obviously they won't work since contains() cannot be use on data frames, I am not very familier with pandas. I searched for many other answers and wasn't able to get it right. I wonder what is the correct way to remove columns with cells that contains ":"
Or if my approch full of unnecessary steps, its there a better approach to turn the txt file into a dataframe of
6:30 7:00 7:30 8:00 8:30 9:00
{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}
{5,6} {5,6} {5,6} {5,6} {1,5,6,} {1,4,5,6}
like sort it when reading the txt file at first place and loop through lines and make dictionaries instead? '6:30': {2, 7} ....
CodePudding user response:
Preprocess your file before create dataframe:
import pandas as pd
import ast
data = []
with open('results.txt') as fp:
for line in fp:
if line.startswith('Shifts:'):
data.append(ast.literal_eval(line.split(':', maxsplit=1)[1].strip()))
df = pd.DataFrame.from_dict(data)
Important update: BUG FIX
ast.literal_eval
works well here but I can guarantee the order of the sets after evaluation
>>> ast.literal_eval('{8, 2, 3, 7}')
{2, 3, 7, 8} # We don't want that!
Maybe you can use re.findall
to get the same output:
import pandas as pd
import re
data = []
with open('results.txt') as fp:
for line in fp:
if line.startswith('Shifts:'):
data.append(dict(re.findall(r"'([^'] )': (\{[^\}] \})", d[1:-1])))
df = pd.DataFrame.from_dict(data)
The output still the same but the cell values are not set
but str
:
>>> df
6:30 7:00 7:30 8:00 8:30 ... 16:30 17:00 17:30 18:00 18:30
0 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
1 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
2 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
3 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
4 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
.. ... ... ... ... ... ... ... ... ... ... ...
142 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
143 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
144 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
145 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
146 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
[147 rows x 25 columns]
CodePudding user response:
data = []
with open('results.txt') as f:
for line in f:
if line.startswith('Shifts: '):
data.append(eval(line[8:]))
df = pd.DataFrame(data)
print(df.head())
Output:
6:30 7:00 7:30 8:00 8:30 9:00 9:30 \
0 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} {1, 2, 6, 7} {1, 2, 6, 7}
1 {5, 6} {5, 6} {5, 6} {5, 6} {1, 5, 6} {1, 4, 5, 6} {1, 4, 5, 6}
2 {8, 1} {8, 1} {8, 1} {8, 1} {8, 1, 7} {8, 1, 2, 7} {8, 1, 2, 7}
3 {1, 6} {1, 6} {1, 6} {1, 6} {1, 5, 6} {1, 4, 5, 6} {1, 4, 5, 6}
4 {3, 4} {3, 4} {3, 4} {3, 4} {3, 4, 5} {2, 3, 4, 5} {2, 3, 4, 5}
10:00 10:30 11:00 ... 14:00 \
0 {1, 2, 3, 6, 7} {1, 2, 3, 6, 7} {1, 3, 4, 6, 7} ... {1, 3, 4, 6, 7}
1 {1, 4, 5, 6, 7} {1, 4, 5, 6, 7} {1, 4, 5, 6, 7} ... {1, 4, 5, 6, 7}
2 {1, 2, 3, 7, 8} {1, 2, 3, 7, 8} {1, 2, 3, 6, 7} ... {1, 2, 3, 6, 7}
3 {1, 3, 4, 5, 6} {1, 3, 4, 5, 6} {3, 4, 5, 6, 7} ... {1, 3, 4, 5, 7}
4 {1, 2, 3, 4, 5} {1, 2, 3, 4, 5} {1, 2, 4, 5, 7} ... {1, 2, 3, 5, 7}
14:30 15:00 15:30 16:00 \
0 {1, 3, 4, 5, 6} {1, 3, 4, 5, 6, 8} {1, 3, 4, 5, 8} {1, 3, 4, 5, 8}
1 {1, 2, 3, 4, 7} {1, 2, 3, 4, 7, 8} {1, 2, 3, 7, 8} {1, 2, 3, 7, 8}
2 {2, 3, 5, 6, 7} {2, 3, 4, 5, 6, 7} {3, 4, 5, 6, 7} {3, 4, 5, 6, 7}
3 {3, 4, 5, 7, 8} {2, 3, 4, 5, 7, 8} {2, 3, 5, 7, 8} {2, 3, 5, 7, 8}
4 {1, 2, 5, 6, 7} {1, 2, 5, 6, 7, 8} {1, 5, 6, 7, 8} {1, 5, 6, 7, 8}
16:30 17:00 17:30 18:00 18:30
0 {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
1 {8, 2, 3, 7} {8, 3} {8, 3} {8, 3} {8, 3}
2 {3, 4, 5, 6} {4, 5} {4, 5} {4, 5} {4, 5}
3 {8, 2, 3, 7} {2, 7} {2, 7} {2, 7} {2, 7}
4 {8, 1, 6, 7} {8, 7} {8, 7} {8, 7} {8, 7}
[5 rows x 25 columns]