Python - how to store multiple variables in one column of a .csv file and then read those variables-CodePudding

I think this issue is pretty basic, but I haven't seen it answered online. I'm using python and I have 'pandas' installed to make things easier. If there's a way to do it without 'pandas' that would be awesome too! I'm coding a node connected map. I want to be able to take in some .csv file with a 'previous' and 'next' node list. I want this data to be then read by the program and stored in a list. For example:

.csv file:

Name	Previous	Next
Alpha	one two	Three
Beta	four	five
Charlie	six	seven eight

what I want in my program:

alpha, [one, two], [three]
beta, [four], [five]
charlie, [six], [seven, eight]

I have heard about two ways to write multiple variables in one .csv column. One way was placing a space in between the two values/variables: alpha,one two,three

and another way I've heard to solve this is use " marks and separate with a comma: alpha,"one,two",three

Although I have heard about these answers before, I haven't been able to implement them. When reading the data in my program, it will assume that the space is part of the string or that the comma is part of the string.

file = pd.read_csv("connections.csv")
previous_alpha = []
previous_alpha.append(file.previous[0])

So, instead of having a list of two strings [one, two] my program will have a list containing one string that looks like ["one,two"] or [one two]

I can change the way the variables are structured in the .csv file or the code reading in the data. Thanks for all the help in advance!

CodePudding user response：

There are multiple ways of doing this. Each for a different way you start with CSV data.

First method will have the data in CSV as a single row with lists of things:

Name,Previous,Next
Alpha,"One,Two",Three
Beta,Four,Five
Charlie,Six,"Seven,Eight"

Note the quotation around the lists. We can use apply to change the values. The convert function will just split the string using , as the delimiter.

import pandas as pd
def convert(x):
    return x.split(',')

df = pd.read_csv('file.csv')
df['Previous'] = df['Previous'].apply(convert)
df['Next'] = df['Previous'].apply(convert)

Second, each row is repeated for Name with the values in CSV:

Name,Previous,Next
Alpha,One,Three
Alpha,Two,Three
Beta,Four,Five
Charlie,Six,Seven
Charlie,Six,Eight

We can you the agg function to aggregate. The convert function drops the duplicates and returns as a list.

import pandas as pd
def convert(x):
    return x.drop_duplicates().to_list()

df = pd.read_csv('file.csv')
df = df.groupby('Name').agg({'Previous': convert, 'Next': convert})

The results should look like this:

           Previous            Next
Name                               
Alpha    [One, Two]         [Three]
Beta         [Four]          [Five]
Charlie       [Six]  [Seven, Eight]

CodePudding user response：

If you have this DataFrame:

      Name Previous         Next
0    Alpha  one two        Three
1     Beta     four         five
2  Charlie      six  seven eight

Then you can split the strings in required columns and save the CSV normally:

df["Previous"] = df["Previous"].str.split()
df["Next"] = df["Next"].str.split()

print(df)
df.to_csv("data.csv", index=False)

      Name    Previous            Next
0    Alpha  [one, two]         [Three]
1     Beta      [four]          [five]
2  Charlie       [six]  [seven, eight]

To load the data back, you can use pd.read_csv with converters= parameter:

from ast import literal_eval

df = pd.read_csv(
    "data.csv", converters={"Previous": literal_eval, "Next": literal_eval}
)
print(df)

Prints:

      Name    Previous            Next
0    Alpha  [one, two]         [Three]
1     Beta      [four]          [five]
2  Charlie       [six]  [seven, eight]