Home > Net >  Pandas, replace double quotation marks to NaN
Pandas, replace double quotation marks to NaN

Time:11-25

input:

"""""""NW_020998607.1"""    397418
"""""""NW_020998607.1"""    2583299
"""""""NW_020998607.1"""    2742463
"""""""NW_020998607.1"""    9131893
"""""""NW_020998607.1"""    11763556
"""""""NW_020998607.1"""    11763572

expected output:

NW_020998607.1  397418
NW_020998607.1  2583299
NW_020998607.1  2742463
NW_020998607.1  9131893
NW_020998607.1  11763556
NW_020998607.1  11763572

output:

"""""""NW_020998607.1"""    397418
"""""""NW_020998607.1"""    2583299
"""""""NW_020998607.1"""    2742463
"""""""NW_020998607.1"""    9131893
"""""""NW_020998607.1"""    11763556
"""""""NW_020998607.1"""    11763572

code:

import pandas as pd

with open(input, 'r') as aaa:
    lines_1 = [line.rstrip('\n').split('\t') for line in aaa]

df = pd.DataFrame(lines_1)

df_replace[0] = df.replace[0]('"', '')

I tried to replace '"' to '', but nothing happened with pandas. Could you help me to remove the double quotation marks?

CodePudding user response:

You can use pandas.Series.str.strip("\"").

>>> import pandas as pd
>>>
>>> with open("input.txt") as f:
...     df = pd.read_csv(f, sep="\s ", header=None)
...     df[0] = df[0].str.strip("\"")
...     print(df)
...
                0         1
0  NW_020998607.1    397418
1  NW_020998607.1   2583299
2  NW_020998607.1   2742463
3  NW_020998607.1   9131893
4  NW_020998607.1  11763556
5  NW_020998607.1  11763572

Note: You can use pd.read_csv to read the data directly from file object with separator as \s .

CodePudding user response:

You can use string replace methods.

name = '"""""""NW_020998607.1"""    397418'

print(name.replace("\"",""))

output

NW_020998607.1 397418

  • Related