Text to Pandas dataframe-CodePudding

I have been trying to convert this text file into a dataframe, but it has been giving me an error or NaN. I need guidance. Below is my code and sample of the text. material.txt sample is below

_accurender\Ceiling\Acoustic Tile_Standard, Gray, 2' x 2' Generic-051 _accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 2' Generic-013 _accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 4' Generic-011 _accurender\Ceramic Tile\Mosaic\Square\2"_Salmon,High Gloss Ceramic-043 _accurender\Concrete\Exposed Aggregate, Pink Concrete-028 _accurender\Concrete\Exposed Aggregate, Tan Concrete-029 _accurender\Exterior\Shakes\Roofing,Shake,Square, Non-Uniform Weathering Generic-052 _accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030 _accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029

df = pd.read_csv('materials.txt', sep=';', header=None,names=['Revit_type', 'Material_Category', 'Material_Name', 'Material_Description'], encoding = 'latin')

I expect the dataframe to look like

     Material_Type   Material_Category    Material_Name    Material_Description

0    _accurender      Masonry              Brick            Brown,_8",Soldier   Masonry-029

Please, assist. Thank you.

CodePudding user response：

Hope this will helpful: but before that you have edit/update your txt file in that format:

_accurender\Ceiling\Acoustic Tile_Standard, Gray, 2' x 2' Generic-051 
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 2' Generic-013 
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 4' Generic-011 
_accurender\Ceramic Tile\Mosaic\Square\2"_Salmon,High Gloss Ceramic-043 
_accurender\Concrete\Exposed Aggregate, Pink Concrete-028 
_accurender\Concrete\Exposed Aggregate, Tan Concrete-029 
_accurender\Exterior\Shakes\Roofing,Shake,Square, Non-Uniform Weathering Generic-052 
_accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030 
_accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029

After edit just run this code.

import pandas as pd
Material_Type = []
Material_Category = []
Material_Name = []
Material_Description = []
file1 = open('sample.txt', 'r')
Lines = file1.readlines()
for line in Lines:
    res_list = line.split('\\')
    if len(res_list) == 3:
          Material_Type.append(res_list[0])
          Material_Category.append(res_list[1])
          Material_Name.append(res_list[2].split()[0])
          Material_Description.append(res_list[2].split()[1])
    else:
          Material_Type.append(res_list[0])
          Material_Category.append(res_list[1])
          Material_Name.append(res_list[2])
          Material_Description.append(res_list[3])

final_dict = {
    "Material_Type":Material_Type,
    "Material_Category":Material_Category,
    "Material_Name":Material_Name,
    "Material_Description":Material_Description
}
df = pd.DataFrame(final_dict)

CodePudding user response：

Try to use sep='\\' instead of sep=';'. Please note this will not solve your problem for all entries in your input file as its format seems somewhat inconsistent.