How to split & remove a number in the middle of string in a python?-CodePudding

DataFrame

# |Name                |Price    |24h   |Volume(24h)
50|Maker50MKR          |$1,096.96|4,52  |$351,617,227
36|Decentraland36MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV47BSV     |$60.38   |4,08  |$50,895,114
86|1inch Network861INCH|$0.7637  |3,74  |$72,279,229
38|Hedera38HBAR        |$0.07594 |3,72  |$58,825,304

Desired Result

# |Name         |Ticker|Price    |24h   |Volume(24h)
50|Maker        |MKR   |$1,096.96|4,52  |$351,617,227
36|Decentraland |MANA  |$0.9754  |4,11  |$265,949,302
47|Bitcoin SV   |BSV   |$60.38   |4,08  |$50,895,114
86|1inch Network|1INCH |$0.7637  |3,74  |$72,279,229
38|Hedera       |HBAR  |$0.07594 |3,72  |$58,825,304

The Problem is:

there is no fixed number/digit of string (0-100)
overlap with ticker name (e.g 1inch)
there is no fixed ticker

CodePudding user response：

creating a simple data frame of your dataset:

simple_dict = {
    "#" : [50, 36, 47, 86, 38],
    "Name" : ["Maker50MKR", "Decentraland36MANA", "Bitcoin SV47BSV", "1inch Network861INCH", "Hedera38HBAR"],
    "Price" : ["$1,096.96", "$0.9754", "$60.38", "$0.7637", "$0.07594"]
}
df = pd.DataFrame(simple_dict)

>>> df

	#	Name	Price
0	50	Maker50MKR	$1,096.96
1	36	Decentraland36MANA	$0.9754
2	47	Bitcoin SV47BSV	$60.38
3	86	1inch Network861INCH	$0.7637
4	38	Hedera38HBAR	$0.07594

According to this [comment] (How to split & remove a number in the middle of string in a python?)

updated_dict = {}
for i, row in df.iterrows():
    ans = row["Name"].split(str(row["#"]))
    row.loc["Name"] = ans[0]
    row.loc["Ticker"] = ans[1]
    updated_dict[i] = row

new_df = pd.DataFrame(updated_dict)

>>> new_df

	0	1	2	3	4
#	50	36	47	86	38
Name	Maker	Decentraland	Bitcoin SV	1inch Network	Hedera
Price	$1,096.96	$0.9754	$60.38	$0.7637	$0.07594
Ticker	MKR	MANA	BSV	1INCH	HBAR

for right show, use transpose or .T:

>>> new_df.T

	#	Name	Price	Ticker
0	50	Maker	$1,096.96	MKR
1	36	Decentraland	$0.9754	MANA
2	47	Bitcoin SV	$60.38	BSV
3	86	1inch Network	$0.7637	1INCH
4	38	Hedera	$0.07594	HBAR

CodePudding user response：

So the thing is that you can split the current name into name and ticker based on the # column. The code below is likely not the best code, nor optimal, but it does do what you need...

Perhaps a pandas guru can optimize this. I would be very interested in that at well.

# insert Ticker column
df.insert(df.columns.get_loc("Name") 1, "Ticker", None)


for index, row in df.iterrows():
    # split the thing based on '#' column and update the columns
    df.at[index, "Name"], df.at[index, "Ticker"] = row["Name"].split(str(row["#"]))

print(df)

resulting df:

    #           Name Ticker      Price   24h   Volume(24h)
0  50          Maker    MKR  $1,096.96  4,52  $351,617,227
1  36   Decentraland   MANA    $0.9754  4,11  $265,949,302
2  47     Bitcoin SV    BSV     $60.38  4,08   $50,895,114
3  86  1inch Network  1INCH    $0.7637  3,74   $72,279,229
4  38         Hedera   HBAR   $0.07594  3,72   $58,825,304

CodePudding user response：

pd.DataFrame(df.apply(lambda x: x.Name.split(str(x["#"])), axis=1).values.tolist())

CodePudding user response：

df['Name'] = df['Name'].apply(lambda name: re.search(r"^[a-zA-Z\s] ", name).group())