DataFrame
# |Name |Price |24h |Volume(24h)
50|Maker50MKR |$1,096.96|4,52 |$351,617,227
36|Decentraland36MANA |$0.9754 |4,11 |$265,949,302
47|Bitcoin SV47BSV |$60.38 |4,08 |$50,895,114
86|1inch Network861INCH|$0.7637 |3,74 |$72,279,229
38|Hedera38HBAR |$0.07594 |3,72 |$58,825,304
Desired Result
# |Name |Ticker|Price |24h |Volume(24h)
50|Maker |MKR |$1,096.96|4,52 |$351,617,227
36|Decentraland |MANA |$0.9754 |4,11 |$265,949,302
47|Bitcoin SV |BSV |$60.38 |4,08 |$50,895,114
86|1inch Network|1INCH |$0.7637 |3,74 |$72,279,229
38|Hedera |HBAR |$0.07594 |3,72 |$58,825,304
The Problem is:
- there is no fixed number/digit of string (0-100)
- overlap with ticker name (e.g 1inch)
- there is no fixed ticker
CodePudding user response:
creating a simple data frame of your dataset:
simple_dict = {
"#" : [50, 36, 47, 86, 38],
"Name" : ["Maker50MKR", "Decentraland36MANA", "Bitcoin SV47BSV", "1inch Network861INCH", "Hedera38HBAR"],
"Price" : ["$1,096.96", "$0.9754", "$60.38", "$0.7637", "$0.07594"]
}
df = pd.DataFrame(simple_dict)
>>> df
# | Name | Price | |
---|---|---|---|
0 | 50 | Maker50MKR | $1,096.96 |
1 | 36 | Decentraland36MANA | $0.9754 |
2 | 47 | Bitcoin SV47BSV | $60.38 |
3 | 86 | 1inch Network861INCH | $0.7637 |
4 | 38 | Hedera38HBAR | $0.07594 |
According to this [comment] (How to split & remove a number in the middle of string in a python?)
updated_dict = {}
for i, row in df.iterrows():
ans = row["Name"].split(str(row["#"]))
row.loc["Name"] = ans[0]
row.loc["Ticker"] = ans[1]
updated_dict[i] = row
new_df = pd.DataFrame(updated_dict)
>>> new_df
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
# | 50 | 36 | 47 | 86 | 38 |
Name | Maker | Decentraland | Bitcoin SV | 1inch Network | Hedera |
Price | $1,096.96 | $0.9754 | $60.38 | $0.7637 | $0.07594 |
Ticker | MKR | MANA | BSV | 1INCH | HBAR |
for right show, use transpose
or .T
:
>>> new_df.T
# | Name | Price | Ticker | |
---|---|---|---|---|
0 | 50 | Maker | $1,096.96 | MKR |
1 | 36 | Decentraland | $0.9754 | MANA |
2 | 47 | Bitcoin SV | $60.38 | BSV |
3 | 86 | 1inch Network | $0.7637 | 1INCH |
4 | 38 | Hedera | $0.07594 | HBAR |
CodePudding user response:
So the thing is that you can split the current name into name and ticker based on the #
column. The code below is likely not the best code, nor optimal, but it does do what you need...
Perhaps a pandas guru can optimize this. I would be very interested in that at well.
# insert Ticker column
df.insert(df.columns.get_loc("Name") 1, "Ticker", None)
for index, row in df.iterrows():
# split the thing based on '#' column and update the columns
df.at[index, "Name"], df.at[index, "Ticker"] = row["Name"].split(str(row["#"]))
print(df)
resulting df:
# Name Ticker Price 24h Volume(24h)
0 50 Maker MKR $1,096.96 4,52 $351,617,227
1 36 Decentraland MANA $0.9754 4,11 $265,949,302
2 47 Bitcoin SV BSV $60.38 4,08 $50,895,114
3 86 1inch Network 1INCH $0.7637 3,74 $72,279,229
4 38 Hedera HBAR $0.07594 3,72 $58,825,304
CodePudding user response:
pd.DataFrame(df.apply(lambda x: x.Name.split(str(x["#"])), axis=1).values.tolist())
CodePudding user response:
df['Name'] = df['Name'].apply(lambda name: re.search(r"^[a-zA-Z\s] ", name).group())