I have been trying to convert this text file into a dataframe, but it has been giving me an error or NaN. I need guidance. Below is my code and sample of the text. material.txt sample is below
_accurender\Ceiling\Acoustic Tile_Standard, Gray, 2' x 2' Generic-051 _accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 2' Generic-013 _accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 4' Generic-011 _accurender\Ceramic Tile\Mosaic\Square\2"_Salmon,High Gloss Ceramic-043 _accurender\Concrete\Exposed Aggregate, Pink Concrete-028 _accurender\Concrete\Exposed Aggregate, Tan Concrete-029 _accurender\Exterior\Shakes\Roofing,Shake,Square, Non-Uniform Weathering Generic-052 _accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030 _accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029
df = pd.read_csv('materials.txt', sep=';', header=None,names=['Revit_type', 'Material_Category', 'Material_Name', 'Material_Description'], encoding = 'latin')
I expect the dataframe to look like
Material_Type Material_Category Material_Name Material_Description
0 _accurender Masonry Brick Brown,_8",Soldier Masonry-029
Please, assist. Thank you.
CodePudding user response:
Hope this will helpful: but before that you have edit/update your txt file in that format:
_accurender\Ceiling\Acoustic Tile_Standard, Gray, 2' x 2' Generic-051
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 2' Generic-013
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 4' Generic-011
_accurender\Ceramic Tile\Mosaic\Square\2"_Salmon,High Gloss Ceramic-043
_accurender\Concrete\Exposed Aggregate, Pink Concrete-028
_accurender\Concrete\Exposed Aggregate, Tan Concrete-029
_accurender\Exterior\Shakes\Roofing,Shake,Square, Non-Uniform Weathering Generic-052
_accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030
_accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029
After edit just run this code.
import pandas as pd
Material_Type = []
Material_Category = []
Material_Name = []
Material_Description = []
file1 = open('sample.txt', 'r')
Lines = file1.readlines()
for line in Lines:
res_list = line.split('\\')
if len(res_list) == 3:
Material_Type.append(res_list[0])
Material_Category.append(res_list[1])
Material_Name.append(res_list[2].split()[0])
Material_Description.append(res_list[2].split()[1])
else:
Material_Type.append(res_list[0])
Material_Category.append(res_list[1])
Material_Name.append(res_list[2])
Material_Description.append(res_list[3])
final_dict = {
"Material_Type":Material_Type,
"Material_Category":Material_Category,
"Material_Name":Material_Name,
"Material_Description":Material_Description
}
df = pd.DataFrame(final_dict)
CodePudding user response:
Try to use sep='\\'
instead of sep=';'
. Please note this will not solve your problem for all entries in your input file as its format seems somewhat inconsistent.