I think this issue is pretty basic, but I haven't seen it answered online. I'm using python and I have 'pandas' installed to make things easier. If there's a way to do it without 'pandas' that would be awesome too! I'm coding a node connected map. I want to be able to take in some .csv file with a 'previous' and 'next' node list. I want this data to be then read by the program and stored in a list. For example:
.csv file:
Name | Previous | Next |
---|---|---|
Alpha | one two | Three |
Beta | four | five |
Charlie | six | seven eight |
what I want in my program:
alpha, [one, two], [three]
beta, [four], [five]
charlie, [six], [seven, eight]
I have heard about two ways to write multiple variables in one .csv column.
One way was placing a space in between the two values/variables:
alpha,one two,three
and another way I've heard to solve this is use " marks and separate with a comma:
alpha,"one,two",three
Although I have heard about these answers before, I haven't been able to implement them. When reading the data in my program, it will assume that the space is part of the string or that the comma is part of the string.
file = pd.read_csv("connections.csv")
previous_alpha = []
previous_alpha.append(file.previous[0])
So, instead of having a list of two strings [one, two]
my program will have a list containing one string that looks like ["one,two"]
or [one two]
I can change the way the variables are structured in the .csv file or the code reading in the data. Thanks for all the help in advance!
CodePudding user response:
There are multiple ways of doing this. Each for a different way you start with CSV data.
First method will have the data in CSV as a single row with lists of things:
Name,Previous,Next
Alpha,"One,Two",Three
Beta,Four,Five
Charlie,Six,"Seven,Eight"
Note the quotation around the lists. We can use apply
to change the values. The convert function will just split the string using ,
as the delimiter.
import pandas as pd
def convert(x):
return x.split(',')
df = pd.read_csv('file.csv')
df['Previous'] = df['Previous'].apply(convert)
df['Next'] = df['Previous'].apply(convert)
Second, each row is repeated for Name
with the values in CSV:
Name,Previous,Next
Alpha,One,Three
Alpha,Two,Three
Beta,Four,Five
Charlie,Six,Seven
Charlie,Six,Eight
We can you the agg
function to aggregate. The convert
function drops the duplicates and returns as a list.
import pandas as pd
def convert(x):
return x.drop_duplicates().to_list()
df = pd.read_csv('file.csv')
df = df.groupby('Name').agg({'Previous': convert, 'Next': convert})
The results should look like this:
Previous Next
Name
Alpha [One, Two] [Three]
Beta [Four] [Five]
Charlie [Six] [Seven, Eight]
CodePudding user response:
If you have this DataFrame:
Name Previous Next
0 Alpha one two Three
1 Beta four five
2 Charlie six seven eight
Then you can split the strings in required columns and save the CSV normally:
df["Previous"] = df["Previous"].str.split()
df["Next"] = df["Next"].str.split()
print(df)
df.to_csv("data.csv", index=False)
Name Previous Next
0 Alpha [one, two] [Three]
1 Beta [four] [five]
2 Charlie [six] [seven, eight]
To load the data back, you can use pd.read_csv
with converters=
parameter:
from ast import literal_eval
df = pd.read_csv(
"data.csv", converters={"Previous": literal_eval, "Next": literal_eval}
)
print(df)
Prints:
Name Previous Next
0 Alpha [one, two] [Three]
1 Beta [four] [five]
2 Charlie [six] [seven, eight]