I am facing a little challenge here. I have the following dataframe:
ID Cards ... opp cards1
0 flop850262436159b10 ('3h', 'tc', '5s') ... 0.000000 10
1 flop850262436159b10 ('3h', 'tc', '5s') ... 0.533690 10
2 flop850262436159b10 ('3h', 'tc', '5s') ... 0.021292 10
3 flop850262436159b10 ('3h', 'tc', '5s') ... 0.022805 10
4 flop850262436159b10 ('3h', 'tc', '5s') ... 0.999691 10
... ... ... ... ...
749995 flop573952955203b10 ('ts', '2c', 'jd') ... 0.162980 10
749996 flop573952955203b10 ('ts', '2c', 'jd') ... 0.541003 10
749997 flop573952955203b10 ('ts', '2c', 'jd') ... 0.341836 10
749998 flop573952955203b10 ('ts', '2c', 'jd') ... 0.219956 10
749999 flop573952955203b10 ('ts', '2c', 'jd') ... 0.363605 10
What I want to do is the following thing. I want to retrieve the first letter from strings stored in column cards
. If the string 3rd character is a letter, then I want to convert it into an integer, if not then leave it as it is.
For instance, with ('ts', '2c', 'jd')
we whould transform t
into 10
I have tried the following code. However it does not seem to work. The filter I am using does not seem to apply to the new column in which I am storing the new value.
df = pd.read_csv('path', sep=";")
if (df['cards'].astype(str).str[0] =="t").any() == True:
df['cards1'] = 10
else:
df['cards1'] = df['cards'].astype(str).str[0]
Dataframe returned below. As you can see in column cards1
, the value 10
is always returned.
ID Cards ... opp cards1
0 flop850262436159b10 ('3h', 'tc', '5s') ... 0.000000 10
1 flop850262436159b10 ('3h', 'tc', '5s') ... 0.533690 10
2 flop850262436159b10 ('3h', 'tc', '5s') ... 0.021292 10
3 flop850262436159b10 ('3h', 'tc', '5s') ... 0.022805 10
4 flop850262436159b10 ('3h', 'tc', '5s') ... 0.999691 10
... ... ... ... ...
749995 flop573952955203b10 ('ts', '2c', 'jd') ... 0.162980 10
749996 flop573952955203b10 ('ts', '2c', 'jd') ... 0.541003 10
749997 flop573952955203b10 ('ts', '2c', 'jd') ... 0.341836 10
749998 flop573952955203b10 ('ts', '2c', 'jd') ... 0.219956 10
749999 flop573952955203b10 ('ts', '2c', 'jd') ... 0.363605 10
I have no clue how to change that to be honest. I would be more than happy to discuss so alternative methods.
Thank for the help folks
CodePudding user response:
Hi I can't a comments yet but I might be able to point you into the right direction.
df['cards'].astype(str).str[0]
will return the first char in your combined str which is '('. You could instead try to get to the desired part of your object through its index.
df['cards'][0][0]
will return "3h" as a string and it's accessible via its index.
if (df['cards'][0][1][0] =="t") == True:
will return True since we access the second element of the tuple at its first char.
('3h', 'tc', '5s')
CodePudding user response:
If with "string 3rd character" you meant an ordinal position of char withing a tuple of chars, use the following approach:
cards_3rdc = df.Cards.apply(lambda x: x[1][0])
df['cards1'] = pd.to_numeric(np.where(cards_3rdc == 't', 10, cards_3rdc))
Sample output of df
:
ID Cards ... opp cards1
idx
0 flop850262436159b10 (3h, tc, 5s) ... 0.000000 10
1 flop850262436159b10 (3h, tc, 5s) ... 0.533690 10
2 flop850262436159b10 (3h, tc, 5s) ... 0.021292 10
3 flop850262436159b10 (3h, tc, 5s) ... 0.022805 10
4 flop850262436159b10 (3h, tc, 5s) ... 0.999691 10
749995 flop573952955203b10 (ts, 2c, jd) ... 0.162980 2
749996 flop573952955203b10 (ts, 2c, jd) ... 0.541003 2
749997 flop573952955203b10 (ts, 2c, jd) ... 0.341836 2
749998 flop573952955203b10 (ts, 2c, jd) ... 0.219956 2
749999 flop573952955203b10 (ts, 2c, jd) ... 0.363605 2
CodePudding user response:
First make sure that the column Cards
is type str.
Then we can take character[8] and check if its in 0-9.
If this condition == True then write "cards10" into the cards1
column.
This could look like this:
df["Cards"]=df["Cards"].astype(str)
df["result"][df["test"].str[8].isin(values=["0","1","2","3","4","5","6","7","8","9"])]="cards11"
#output
0 ('3h', 'tc', '5s')
1 ('ts', '2c', 'jd') cards11
2 ('3h', 'tc', '5s')
or even smarter if you want to replace it inside your Cards column
df["Cards"].replace("t","10",regex=True, inplace=True)
#output
#0 ('3h', '10c', '5s')
#1 ('10s', '2c', 'jd')
#2 ('3h', '10c', '5s')