I've a txt
file with the following row type:
"Hello I'm in Tensorflow"
"My name is foo"
'Mr "alias" is running'
...
So at it can be seen, just one string per row. When I try to create a tf.data.Dataset
, the output looks like this:
conver = TextLineDataset('path_to.txt')
for utter in conver:
print(utter)
break
# tf.Tensor(b'"Hello I'm in Tensorflow"', shape=(), dtype=string)
If you notice, the quotation mark "
is still present at the beginning and end of the string (plus the defined by the tensor '
). My desired output would be:
# tf.Tensor(b'Hello I'm in Tensorflow', shape=(), dtype=string)
That is, without the quotation marks. Thank you in advance
CodePudding user response:
You could use tf.strings.regex_replace
:
import tensorflow as tf
conver = tf.data.TextLineDataset('/content/text.txt')
def remove_quotes(text):
text = tf.strings.regex_replace(text, '\"', '')
text = tf.strings.regex_replace(text, '\'', '')
return text
conver = conver.map(remove_quotes)
for s in conver:
print(s)
tf.Tensor(b'Hello Im in Tensorflow', shape=(), dtype=string)
tf.Tensor(b'My name is foo', shape=(), dtype=string)
tf.Tensor(b'Mr alias is running', shape=(), dtype=string)
Or if you just want to remove the leading and trailing quotes then try this:
text = tf.strings.regex_replace(text, '^[\"\']*|[\"\']*$', '')
CodePudding user response:
The eval()
function should do it.
for utter in conver:
print(eval(utter))
break
or you can simply use replace
-
for utter in conver:
print(utter.replace('"',''))
break
CodePudding user response:
If you want to preserve quotation marks in the string that are not in the end or the start of the string -
for utter in conver:
print(''.join([utter[i] if not (utter[i] == '"' and (i==0 or i==len(utter)-1)) else '' for i in range(len(utter))]))
break