I want to read the txt file from here with Dota 2 mmrs for different players. It has the form as below:
1) "103757918"
2) "1"
3) "107361667"
4) "1"
5) "108464725"
6) "1"
7) "110818765"
8) "1"
9) "111436016"
10) "1"
11) "113518306"
12) "1"
13) "118896321"
14) "1"
15) "119780733"
16) "1"
17) "120360801"
18) "1"
19) "120870684"
20) "1"
21) "122616345"
22) "1"
23) "124393917"
24) "1"
25) "124487030"
With the account_id (e.g 103757918) followed by the mmr of the player (e.g 1). How can I read this in a pandas dataframe with two columns = account_id, mmr?
I don't need the index numbers.
CodePudding user response:
Read the data how you normally would, slice the data every other row and concat. After you can rename the columns to whatever you want.
Slicing the data this way assumes that the first value is always the account_id followed by the mmr. Notice how row 25 is missing the mmr in your sample data and is therefore null.
df = pd.read_csv(data.txt, sep='\s ', header=None)
pd.concat([df[1][::2].reset_index(drop=True),
df[1][1::2].reset_index(drop=True)], axis=1)
Here is a working example based on your sample data
s = '''1) "103757918"
2) "1"
3) "107361667"
4) "1"
5) "108464725"
6) "1"
7) "110818765"
8) "1"
9) "111436016"
10) "1"
11) "113518306"
12) "1"
13) "118896321"
14) "1"
15) "119780733"
16) "1"
17) "120360801"
18) "1"
19) "120870684"
20) "1"
21) "122616345"
22) "1"
23) "124393917"
24) "1"
25) "124487030"'''
from io import StringIO
df = pd.read_csv(StringIO(s),sep='\s ', header=None)
data = pd.concat([df[1][::2].reset_index(drop=True),
df[1][1::2].reset_index(drop=True)], axis=1)
data.columns = ['account_id', 'mmr']
account_id mmr
0 103757918 1.0
1 107361667 1.0
2 108464725 1.0
3 110818765 1.0
4 111436016 1.0
5 113518306 1.0
6 118896321 1.0
7 119780733 1.0
8 120360801 1.0
9 120870684 1.0
10 122616345 1.0
11 124393917 1.0
12 124487030 NaN
CodePudding user response:
You can use read_table since read_csv with \n as a delimiter won't work. Then shift and arrange the values like this:
df = pd.read_table('./mmr.txt', delim_whitespace=True, names=['account_id', 'mmr'])
df['account_id'] = df['mmr']
df['mmr'] = df['mmr'].shift(-1)
df = df[df.index % 2 == 0].reset_index(drop=True)
output will be:
account_id mmr
0 103757918 1.0
1 107361667 1.0
2 108464725 1.0
3 110818765 1.0
...
1069671 93119769 8674.0