Home > Enterprise >  How to split & remove a number in the middle of string in a python?
How to split & remove a number in the middle of string in a python?

Time:08-02

DataFrame

# |Name                |Price    |24h   |Volume(24h)
50|Maker50MKR          |$1,096.96|4,52  |$351,617,227
36|Decentraland36MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV47BSV     |$60.38   |4,08  |$50,895,114
86|1inch Network861INCH|$0.7637  |3,74  |$72,279,229
38|Hedera38HBAR        |$0.07594 |3,72  |$58,825,304

Desired Result

# |Name         |Ticker|Price    |24h   |Volume(24h)
50|Maker        |MKR   |$1,096.96|4,52  |$351,617,227
36|Decentraland |MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV   |BSV   |$60.38   |4,08  |$50,895,114
86|1inch Network|1INCH |$0.7637  |3,74  |$72,279,229
38|Hedera       |HBAR  |$0.07594 |3,72  |$58,825,304

The Problem is:

  • there is no fixed number/digit of string (0-100)
  • overlap with ticker name (e.g 1inch)
  • there is no fixed ticker

CodePudding user response:

creating a simple data frame of your dataset:

simple_dict = {
    "#" : [50, 36, 47, 86, 38],
    "Name" : ["Maker50MKR", "Decentraland36MANA", "Bitcoin SV47BSV", "1inch Network861INCH", "Hedera38HBAR"],
    "Price" : ["$1,096.96", "$0.9754", "$60.38", "$0.7637", "$0.07594"]
}
df = pd.DataFrame(simple_dict)
>>> df
# Name Price
0 50 Maker50MKR $1,096.96
1 36 Decentraland36MANA $0.9754
2 47 Bitcoin SV47BSV $60.38
3 86 1inch Network861INCH $0.7637
4 38 Hedera38HBAR $0.07594

According to this [comment] (How to split & remove a number in the middle of string in a python?)

updated_dict = {}
for i, row in df.iterrows():
    ans = row["Name"].split(str(row["#"]))
    row.loc["Name"] = ans[0]
    row.loc["Ticker"] = ans[1]
    updated_dict[i] = row

new_df = pd.DataFrame(updated_dict)
>>> new_df
0 1 2 3 4
# 50 36 47 86 38
Name Maker Decentraland Bitcoin SV 1inch Network Hedera
Price $1,096.96 $0.9754 $60.38 $0.7637 $0.07594
Ticker MKR MANA BSV 1INCH HBAR

for right show, use transpose or .T:

>>> new_df.T
# Name Price Ticker
0 50 Maker $1,096.96 MKR
1 36 Decentraland $0.9754 MANA
2 47 Bitcoin SV $60.38 BSV
3 86 1inch Network $0.7637 1INCH
4 38 Hedera $0.07594 HBAR

CodePudding user response:

So the thing is that you can split the current name into name and ticker based on the # column. The code below is likely not the best code, nor optimal, but it does do what you need...

Perhaps a pandas guru can optimize this. I would be very interested in that at well.

# insert Ticker column
df.insert(df.columns.get_loc("Name") 1, "Ticker", None)


for index, row in df.iterrows():
    # split the thing based on '#' column and update the columns
    df.at[index, "Name"], df.at[index, "Ticker"] = row["Name"].split(str(row["#"]))

print(df)

resulting df:

    #           Name Ticker      Price   24h   Volume(24h)
0  50          Maker    MKR  $1,096.96  4,52  $351,617,227
1  36   Decentraland   MANA    $0.9754  4,11  $265,949,302
2  47     Bitcoin SV    BSV     $60.38  4,08   $50,895,114
3  86  1inch Network  1INCH    $0.7637  3,74   $72,279,229
4  38         Hedera   HBAR   $0.07594  3,72   $58,825,304

CodePudding user response:

pd.DataFrame(df.apply(lambda x: x.Name.split(str(x["#"])), axis=1).values.tolist())

CodePudding user response:

df['Name'] = df['Name'].apply(lambda name: re.search(r"^[a-zA-Z\s] ", name).group())
  • Related