I wrote a function that generates a table after feeding it a list.
It is part of a web scraping script I'm working on.
The function works (not the best but good enough for its purpose) but is there a better way to achieve better/similar/same result?
For example, here's a list I would want to turn into a table:
listings =
["Search Result", "Advanced Search", "Item Trader Location Price Last Seen", "Sealed Blacksmithing Writ", "Rewards 356 Vouchers",
"Level 1", "@rscus2001", "Shadowfen: Stormhold", "Ghost Sea Trading Co", "71,200", "X", "1", "=", "71,200 3 Hour ago", "Sealed Blacksmithing Writ", "Rewards 328 Vouchers",
"Level 1", "@Deirdre531", "Grahtwood: Elden Root", "piston", "100,000", "X", "1", "=", "100,000 6 Hour ago", "Sealed Blacksmithing Writ", "Rewards 328 Vouchers",
"Level 1", "@Araxas", "Luminous Legion", "100,000", "X", "1", "=", "100,000 9 Hour ago", "Sealed Blacksmithing Writ", "Rewards 356 Vouchers",
"Level 1", "@CaffeinatedMayhem", "Craglorn: Belkarth", "Masser's Merchants", "25,000", "X", "1", "=", "25,000 13 Hour ago", "Sealed Blacksmithing Writ", "Rewards 287 Vouchers",
"Level 1", "@Gregori_Weissteufel", "Wrothgar: Morkul Stronghold", "The Cutthroat Mutineers", "45,000", "X", "1", "=", "45,000 13 Hour ago", "<", "1", ">"]
Result:
0 1 2 3 4 5 6 7 8 9 10
0 Sealed Blacksmithing Writ Rewards 356 Vouchers Level 1 @rscus2001 Shadowfen: Stormhold Ghost Sea Trading Co 71,200 X 1 = 71,200 3 Hour ago
1 Sealed Blacksmithing Writ Rewards 328 Vouchers Level 1 @Deirdre531 Grahtwood: Elden Root piston 100,000 X 1 = 100,000 6 Hour ago
2 Sealed Blacksmithing Writ Rewards 328 Vouchers Level 1 @Araxas Luminous Legion 100,000 X 1 = 100,000 9 Hour ago None
3 Sealed Blacksmithing Writ Rewards 356 Vouchers Level 1 @CaffeinatedMayhem Craglorn: Belkarth Masser's Merchants 25,000 X 1 = 25,000 13 Hour ago
4 Sealed Blacksmithing Writ Rewards 287 Vouchers Level 1 @Gregori_Weissteufel Wrothgar: Morkul Stronghold The Cutthroat Mutineers 45,000 X 1 = 45,000 13 Hour ago
Below is my code:
import re
import pandas as pd
pd.set_option('display.max_columns', None)
pd.options.display.width=None
def MakeTable(listings):
hour_idx = [i for i, item in enumerate(listings) if re.search(r"([0-9,]*\s[0-9]*\s(Minute|Hour)\sago|[0-9,]*\sNow)", item)]
if len(hour_idx) == 1:
ls = [listings[3:hour_idx[0] 1]]
elif len(hour_idx) == 2:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1]]
elif len(hour_idx) == 3:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1]]
elif len(hour_idx) == 4:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1],listings[hour_idx[2] 1:hour_idx[3] 1]]
else:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1],listings[hour_idx[2] 1:hour_idx[3] 1],listings[hour_idx[3] 1:hour_idx[4] 1]]
df = pd.DataFrame(ls)
print(df)
CodePudding user response:
We can use list
comprehensions
and zip
statement:
def MakeTable(listings):
hour_idx = [i for i, item in enumerate(listings) if re.search(r"([0-9,]*\s[0-9]*\s(Minute|Hour)\sago|[0-9,]*\sNow)", item)]
ls = [listings[3:hour_idx[0] 1]]
ls_2 = [x[y[i] 1:y[i 1] 1] for (x, y, i) in zip(listings, hour_idx, range(len(hour_idx)-1))]
ls = ls.append(ls_2)
df = pd.DataFrame(ls)
print(df)
CodePudding user response:
I guess it's already answered - but I had a wee go for fun:
import re
import pandas as pd
pd.set_option('display.max_columns', None)
pd.options.display.width=None
human_time_re = re.compile(r"([0-9,]*\s[0-9]*\s(Minute|Hour)\sago|[0-9,]*\sNow)")
def make_table(listings):
hour_idx = [i for i, item in enumerate(listings) if human_time_re.search(item)]
hour_key = lambda key: hour_idx[key] 1
idx = lambda key, key2=0: listings[key:hour_key(key2)]
idx_more = lambda key=0, key2=1: listings[hour_key(key):hour_key(key2)]
ls = (idx(3),) tuple(idx_more(i, i 1) for i in range(len(hour_idx) - 1))
return ls
res = make_table(listings)
ls = pd.DataFrame(res)
print(res)
As far as I can see, it does exactly the same as your posted version.
CodePudding user response:
Python 3.10, you can write switch statements syntax below:
def MakeTable(listings):
hour_idx = [i for i, item in enumerate(listings) if re.search(r"([0-9,]*\s[0-9]*\s(Minute|Hour)\sago|[0-9,]*\sNow)", item)]
match len(hour_idx):
case 1:
ls = [listings[3:hour_idx[0] 1]]
case 2:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1]]
case 3:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1]]
case 4:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1],listings[hour_idx[2] 1:hour_idx[3] 1]]
case _:
ls = [listings[3:hour_idx[0] 1],listings[hour_idx[0] 1:hour_idx[1] 1],listings[hour_idx[1] 1:hour_idx[2] 1],listings[hour_idx[2] 1:hour_idx[3] 1],listings[hour_idx[3] 1:hour_idx[4] 1]]
df = pd.DataFrame(ls)
print(df)