Home > Blockchain >  Convert string of tuples to list of tuples
Convert string of tuples to list of tuples

Time:06-28

I am reading a CSV file with Pandas and encountering a parsing problem wherein single quotes are dropped, thereby changing string values into undefined variables as seen by Python. NOTE: I have not found a "convert string to list" topic here that applies to my problem.

I have a CSV file that looks like this:

template_name,detect_time,no_chans,detect_val,detect_ratio,chans
2019_04_27t01_41_43,2018-05-04T12:18:09.633400Z,2,1.33368,0.666838109493,"('CHI', 'BHZ'), ('S14K', 'BHZ')"
2018_09_02t00_56_23,2018-05-10T16:40:33.508400Z,2,-1.34189,-0.670946359634,"('FALS', 'BHZ'), ('SDPT', 'BHZ')"

The last column, named chans should be read as a list of tuples. I am reading the file with Pandas. I've converted the column using either pd.eval or ast.literal_eval. Both of these strip the inner single quotes, so I end up with a variable name instead of a string.

df = pd.read_csv(dfile, converters={'chans':ast.literal_eval})
df['chans']
0                               ((CHI, BHZ), (S14K, BHZ))
1                              ((FALS, BHZ), (SDPT, BHZ))

Using pd.eval the result is virtually the same, but it converts to a list of lists:

0                               [[CHI, BHZ], [S14K, BHZ]]
1                              [[FALS, BHZ], [SDPT, BHZ]]

The single quotes around the strings have been dropped and now Python interprets (CHI, BHZ) as a tuple of two undefined variables.

If I don't use any converter pd.read_csv(dfile), I get strings like this:

0                         ('CHI', 'BHZ'), ('S14K', 'BHZ')
1                        ('FALS', 'BHZ'), ('SDPT', 'BHZ')

I guess I could process this to get what I want - a list of those tuples for each row - but are there better ways to do it?

CodePudding user response:

You can use a lambda using list(eval) as your converter:

df = pd.read_csv(dfile, converters={'chans':lambda x:list(eval(x))})

this will give you a list of tuples instead of just a string.

CodePudding user response:

I found a simple solution: the much maligned Python function eval. I had ignored this possibility due to so many warnings: slow speed and insecure. But security is not an issue for me, and eval can do the job. However eval only works correctly from Python and not within Pandas. So my code gets a bit ugly:

df = pd.read_csv(dfile)
for index,row in df.iterrows():
    row['chans'] = eval(row['chans'])

These other "solutions" don't do the job, because they strip the single quotes:

df = pd.read_csv(dfile)
df['chans'] = df['chans'].apply(eval)

Or:

df = pd.read_csv(dfile, converters={'chans':eval})

It's a pity that I have to use iterrows.

I'd be curious to hear of other solutions. Since I don't care if the string gets interpreted as tuples or lists, involving json seemed to be a possibility.

  • Related