Read txt file with pandas into dataframe-CodePudding

I want to read the txt file from here with Dota 2 mmrs for different players. It has the form as below:

      1) "103757918"
      2) "1"
      3) "107361667"
      4) "1"
      5) "108464725"
      6) "1"
      7) "110818765"
      8) "1"
      9) "111436016"
     10) "1"
     11) "113518306"
     12) "1"
     13) "118896321"
     14) "1"
     15) "119780733"
     16) "1"
     17) "120360801"
     18) "1"
     19) "120870684"
     20) "1"
     21) "122616345"
     22) "1"
     23) "124393917"
     24) "1"
     25) "124487030"

With the account_id (e.g 103757918) followed by the mmr of the player (e.g 1). How can I read this in a pandas dataframe with two columns = account_id, mmr?

I don't need the index numbers.

CodePudding user response：

Read the data how you normally would, slice the data every other row and concat. After you can rename the columns to whatever you want.

Slicing the data this way assumes that the first value is always the account_id followed by the mmr. Notice how row 25 is missing the mmr in your sample data and is therefore null.

df = pd.read_csv(data.txt, sep='\s ', header=None)
pd.concat([df[1][::2].reset_index(drop=True),
           df[1][1::2].reset_index(drop=True)], axis=1)

Here is a working example based on your sample data

s = '''1) "103757918"
2) "1"
3) "107361667"
4) "1"
5) "108464725"
6) "1"
7) "110818765"
8) "1"
9) "111436016"
10) "1"
11) "113518306"
12) "1"
13) "118896321"
14) "1"
15) "119780733"
16) "1"
17) "120360801"
18) "1"
19) "120870684"
20) "1"
21) "122616345"
22) "1"
23) "124393917"
24) "1"
25) "124487030"'''


from io import StringIO

df = pd.read_csv(StringIO(s),sep='\s ', header=None)
data = pd.concat([df[1][::2].reset_index(drop=True),
              df[1][1::2].reset_index(drop=True)], axis=1)

data.columns = ['account_id', 'mmr']

    account_id  mmr
0    103757918  1.0
1    107361667  1.0
2    108464725  1.0
3    110818765  1.0
4    111436016  1.0
5    113518306  1.0
6    118896321  1.0
7    119780733  1.0
8    120360801  1.0
9    120870684  1.0
10   122616345  1.0
11   124393917  1.0
12   124487030  NaN

CodePudding user response：

You can use read_table since read_csv with \n as a delimiter won't work. Then shift and arrange the values like this:

df = pd.read_table('./mmr.txt', delim_whitespace=True, names=['account_id', 'mmr'])
df['account_id'] = df['mmr']
df['mmr'] = df['mmr'].shift(-1)
df = df[df.index % 2 == 0].reset_index(drop=True)

output will be:

    account_id  mmr
0   103757918   1.0
1   107361667   1.0
2   108464725   1.0
3   110818765   1.0
...
1069671 93119769    8674.0