Home > front end >  Pandas - Column operations with filters (string characters transformation)
Pandas - Column operations with filters (string characters transformation)

Time:01-28

I am facing a little challenge here. I have the following dataframe:

                         ID               Cards  ...       opp cards1
0       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.000000     10
1       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.533690     10
2       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.021292     10
3       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.022805     10
4       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.999691     10
                    ...                 ...  ...       ...    ...
749995  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.162980     10
749996  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.541003     10
749997  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.341836     10
749998  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.219956     10
749999  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.363605     10

What I want to do is the following thing. I want to retrieve the first letter from strings stored in column cards. If the string 3rd character is a letter, then I want to convert it into an integer, if not then leave it as it is.

For instance, with ('ts', '2c', 'jd') we whould transform t into 10

I have tried the following code. However it does not seem to work. The filter I am using does not seem to apply to the new column in which I am storing the new value.

df = pd.read_csv('path', sep=";")


if (df['cards'].astype(str).str[0] =="t").any() == True:
    df['cards1'] = 10
else:
    df['cards1'] = df['cards'].astype(str).str[0]

    

Dataframe returned below. As you can see in column cards1, the value 10 is always returned.

                         ID               Cards  ...       opp cards1
0       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.000000     10
1       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.533690     10
2       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.021292     10
3       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.022805     10
4       flop850262436159b10  ('3h', 'tc', '5s')  ...  0.999691     10
                    ...                 ...  ...       ...    ...
749995  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.162980     10
749996  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.541003     10
749997  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.341836     10
749998  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.219956     10
749999  flop573952955203b10  ('ts', '2c', 'jd')  ...  0.363605     10

I have no clue how to change that to be honest. I would be more than happy to discuss so alternative methods.

Thank for the help folks

CodePudding user response:

Hi I can't a comments yet but I might be able to point you into the right direction.

df['cards'].astype(str).str[0]

will return the first char in your combined str which is '('. You could instead try to get to the desired part of your object through its index.

df['cards'][0][0]

will return "3h" as a string and it's accessible via its index.

if (df['cards'][0][1][0] =="t") == True: 

will return True since we access the second element of the tuple at its first char.

('3h', 'tc', '5s')

CodePudding user response:

If with "string 3rd character" you meant an ordinal position of char withing a tuple of chars, use the following approach:

cards_3rdc = df.Cards.apply(lambda x: x[1][0])
df['cards1'] = pd.to_numeric(np.where(cards_3rdc == 't', 10, cards_3rdc))

Sample output of df:

                         ID         Cards  ...       opp  cards1
idx                                                             
0       flop850262436159b10  (3h, tc, 5s)  ...  0.000000      10
1       flop850262436159b10  (3h, tc, 5s)  ...  0.533690      10
2       flop850262436159b10  (3h, tc, 5s)  ...  0.021292      10
3       flop850262436159b10  (3h, tc, 5s)  ...  0.022805      10
4       flop850262436159b10  (3h, tc, 5s)  ...  0.999691      10
749995  flop573952955203b10  (ts, 2c, jd)  ...  0.162980       2
749996  flop573952955203b10  (ts, 2c, jd)  ...  0.541003       2
749997  flop573952955203b10  (ts, 2c, jd)  ...  0.341836       2
749998  flop573952955203b10  (ts, 2c, jd)  ...  0.219956       2
749999  flop573952955203b10  (ts, 2c, jd)  ...  0.363605       2

CodePudding user response:

First make sure that the column Cards is type str.

Then we can take character[8] and check if its in 0-9.

If this condition == True then write "cards10" into the cards1 column.

This could look like this:

df["Cards"]=df["Cards"].astype(str)

df["result"][df["test"].str[8].isin(values=["0","1","2","3","4","5","6","7","8","9"])]="cards11"
#output
0   ('3h', 'tc', '5s')  
1   ('ts', '2c', 'jd')  cards11
2   ('3h', 'tc', '5s')  

or even smarter if you want to replace it inside your Cards column

df["Cards"].replace("t","10",regex=True, inplace=True)
#output
#0  ('3h', '10c', '5s') 
#1  ('10s', '2c', 'jd') 
#2  ('3h', '10c', '5s') 
  • Related