I have a .csv following this logic
name, number, 2dlist, bool
"entry1", 1, [[0,1],[2,3]], true
"entry2", 2, [[4,5],[6,7]], true
What kind of regex do I need to separate the rows to four columns so that everything inside the double square brackets get noted as one column, i.e. [[ ... ]].
I'm new to regex but managed to edit the following code sample
df = pd.read_csv("../file.csv", sep=r",(?![^\[]*[\]])",header=0, engine="python")
which does work with single brackets but not with double. As in, the comma between the lists 1],[2
gets still recognized as a separator even though it shouldn't.
This is a part of a hobby project and I might change the initial approach for better. However, at this point I'm only curious about the regex that would work in this specific case.
CodePudding user response:
With your sample, you can probably split your dataframe with ,
but maybe it's not so simple:
df = pd.read_csv('data.csv', sep=', ', engine='python')
print(df)
# Output
name number 2dlist bool
0 "entry1" 1 [[0,1],[2,3]] True
1 "entry2" 2 [[4,5],[6,7]] True
CodePudding user response:
if your csv looks like this
name,number,2dlist,bool
0,"entry1",1,"[[0,1],[2,3]]",True
1,"entry2",2,"[[4,5],[6,7]]",True
this would work fine:
df = pd.read_csv('data.csv', sep=',')
cause now list is stored in between apostrophes, the spaces and comma's in between get ignored. If data is not stored that way good regex codes are required to separate in a generic way. Try adding regex tag to question u might better solutions then.