Home > Back-end >  Save pandas dataframe as txt file in Python, with dataframe columns containing either single int val
Save pandas dataframe as txt file in Python, with dataframe columns containing either single int val

Time:07-01

I am trying to save a complex pandas dataframe as txt file in Python. The dataframe is composed of data obtained using openCV, with different characteristics of objects being detected using a computer vision code. For example, data present in the dataframe are object height, object width, class label, and contour coordinates (x and y coordinates) of the objects present in the image.

To help you visualize this, here is what the dataframe looks like:

Height Width Label Contour
1 12 32 SpeciesA [[[ 670 1921]] [[ 666 1925]] [[ 665 1924]] ... [[ 658 1870]]]
2 15 30 SpeciesB [[[ 670 1921]] [[ 666 1925]] [[ 665 1924]] ... [[ 658 1870]]]
3 11 31 SpeciesC [[[ 670 1921]] [[ 666 1925]] [[ 665 1924]] ... [[ 658 1870]]]
4 10 27 SpeciesD [[[ 670 1921]] [[ 666 1925]] [[ 665 1924]] ... [[ 658 1870]]]

My pandas dataframe is thus composed of Integers, Strings, and Lists, and this is where the trouble starts. How to properly save this as a txt so that I could later on load it again, and access each element of the dataframe?

I am asking this question because, in the past, before I started to add the contour lists to the dataframe, my code worked pretty fine, and was like this:

# Export dataframe as txt
df_to_save['Height'] = df_to_save['Height'].astype(int)
df_to_save['Width'] = df_to_save['Width'].astype(int)
df_to_save['Label'] = df_to_save['Label'].astype(str)
tfile = open("savedDF.txt", 'a')
tfile.write(df_to_save.to_string(index=False))
tfile.close()

# Load txt into pandas dataframe
df_loaded = pd.read_csv("savedDF.txt", sep=r'\s{1,}', engine='python')

So this worked fine, but now that I have added the contour list in a new column in my pandas dataframe, the export still works fine, but the import no longer works.

# Export dataframe as txt
df_to_save['Height'] = df_to_save['Height'].astype(int)
df_to_save['Width'] = df_to_save['Width'].astype(int)
df_to_save['Label'] = df_to_save['Label'].astype(str)
df_to_save['Contour'] = df_to_save['Contour'].astype(object)
tfile = open("savedDF.txt", 'a')
tfile.write(df_to_save.to_string(index=False))
tfile.close()

# Load txt into pandas dataframe
df_loaded = pd.read_csv("savedDF.txt", sep=r'\s{1,}', engine='python')

Indeed, this code generates this error:

File "...python_parser.py", line 739, in _alert_malformed raise ParserError(msg)

pandas.errors.ParserError: Expected 88 fields in line 10, saw 95. Error could possibly be due to quotes being ignored when a multi-char delimiter is used

I am therefore wondering what is the proper way to generate, export, and import such dataframe composed of multiple types of data (integer, str, and lists)? The idea is to be able to access each of the value contained in the table cell, but also to redraw object contours on the image etc., so I wish to preserve the list format, or at least a format that will allow me to generate again the object contours later on, when loading the saved txt file, using openCV drawContours function.

Thank you for your help.

CodePudding user response:

This is more of a comment with a bunch of observations - but with more room.

First I've never been able to get the df_to_save.to_string() approach for creating an exact string representation of a dataframe to work. Usually it is that the string representation is collapsed on a wide column.

Second writing out as a text file and then back in as a CSV presents some problems. Before you write out the file the Contour is a single column. When you read it back in it becomes dozens of columns. read_csv with sep=r'\s{1,}' means that every set of contiguous spaces in the original Contour field delineates a new field. It would be possible to make that work but you would need perfect orchestration. Plus, you need the number of space-sets to be exactly the same in each row of the Contour column. This tripped you up and that is why you ended up with ... Expected 88 fields in line 10, saw 95 ...

You may want to investigate other approaches.

CodePudding user response:

I have found a working solution for my issue. Problem with contours was the array structure more than an issue with a List format.

Here is my solution

1. Storing contour X and Y coordinates

First thing I do is to store as a List the X and Y coordinates of the contours of the objects I have in my images:

cntCoordX = [x[0][0] for x in cnt]
cntCoordY = [x[0][1] for x in cnt]

My pandas dataframe then looks like this:

Height Width Label XCoord YCoord
1 12 32 SpeciesA [670 666 665 ... 658] [1921 1925 1924 ... 1870]
2 15 30 SpeciesB [670 666 665 ... 658] [1921 1925 1924 ... 1870]
3 11 31 SpeciesC [670 666 665 ... 658] [1921 1925 1924 ... 1870]
4 10 27 SpeciesD [670 666 665 ... 658] [1921 1925 1924 ... 1870]

2. Saving dataframe as txt

Dataframe is then saved as follows:

dataframe.to_csv("filename.txt", index = False, sep='\t')

3. Later, reloading the txt file into a pandas dataframe and recalculating the contours

And in the other module of my code, when willing to load this saved dataframe, here is the code for loading the dataframe and to create a new column that reconstructs the contours based on the X and Y coordinates:

dataframe = pd.read_csv("filename.txt", sep='\t', engine='python')
allContours = []

for i in range(len(dataframe)):
    contourXCoord = dataframe.loc[i, 'ContourXCoord']
    contourXCoord = contourXCoord.replace("[","")
    contourXCoord = contourXCoord.replace("]","")
    contourXCoord = contourXCoord.split(", ")
    contourXCoord = [int(x) for x in contourXCoord]
    contourYCoord = dataframe.loc[i, 'ContourYCoord']
    contourYCoord = contourYCoord.replace("[","")
    contourYCoord = contourYCoord.replace("]","")
    contourYCoord = contourYCoord.split(", ")
    contourYCoord = [int(y) for y in contourYCoord]

    contourCoords = []
    for i in range(len(contourXCoord)):
        contourCoord = [contourXCoord[i],contourYCoord[i]]
        contourCoords.append(contourCoord)

    contourCoordsArray = np.array(contourCoords).reshape((-1,1,2)).astype(np.int32)
    allContours.append(contourCoordsArray)

dataframe['Contours'] = allContours

Hoping this solution will help people facing the same issue as me.

  • Related