Home > OS >  Regex separator for splitting a CSV with double brackets / nd lists
Regex separator for splitting a CSV with double brackets / nd lists

Time:01-03

I have a .csv following this logic

name, number, 2dlist, bool
"entry1", 1, [[0,1],[2,3]], true
"entry2", 2, [[4,5],[6,7]], true

What kind of regex do I need to separate the rows to four columns so that everything inside the double square brackets get noted as one column, i.e. [[ ... ]].

I'm new to regex but managed to edit the following code sample

df = pd.read_csv("../file.csv", sep=r",(?![^\[]*[\]])",header=0, engine="python")

which does work with single brackets but not with double. As in, the comma between the lists 1],[2 gets still recognized as a separator even though it shouldn't.

This is a part of a hobby project and I might change the initial approach for better. However, at this point I'm only curious about the regex that would work in this specific case.

CodePudding user response:

With your sample, you can probably split your dataframe with , but maybe it's not so simple:

df = pd.read_csv('data.csv', sep=', ', engine='python')
print(df)

# Output
       name  number         2dlist  bool
0  "entry1"       1  [[0,1],[2,3]]  True
1  "entry2"       2  [[4,5],[6,7]]  True

CodePudding user response:

if your csv looks like this

      name,number,2dlist,bool
0,"entry1",1,"[[0,1],[2,3]]",True
1,"entry2",2,"[[4,5],[6,7]]",True

this would work fine:

df = pd.read_csv('data.csv', sep=',')

cause now list is stored in between apostrophes, the spaces and comma's in between get ignored. If data is not stored that way good regex codes are required to separate in a generic way. Try adding regex tag to question u might better solutions then.

  • Related