The output of the Pandas Dataframe using the following code:
payload={}
files={}
headers = {
'Accept': 'text/csv',
'Authorization': 'Bearer ' token}
for k in request_dic.keys():
base_url = "https://feeds.myfeed.com/api/"
url = base_url request_dic[k]
print(url)
response = requests.request("GET", url, headers=headers, data=payload, files=files)
dt = pd.read_csv(StringIO(response.text),sep="|", encoding='base64')
Is:
Can someone help with a regex that will remove 
CodePudding user response:
something like this maybe
import re
for k in request_dic.keys():
base_url = "https://feeds.myfeed.com/api/"
url = base_url request_dic[k]
print(url)
response = requests.request("GET", url, headers=headers, data=payload, files=files)
dt = pd.read_csv(StringIO(response.text),sep="|", encoding='base64')
for col in dt.columns:
dt.rename({col:re.findall('([A-Z].*)',col)[0]},inplace=True,axis=1)
CodePudding user response:
"".join([ch for ch in "COUNTRY ID" if str.isascii(ch)]).strip()
I prefer it, use something like it in rename
method, like @SuperStew does
CodePudding user response:
Since you specifically ask for a regex, the following line will remove any characters which are not (^
) in the upper- or lowercase alphabet (A-Za-z
) and not a whitespace (\s
).
dt.columns = dt.columns.str.replace('[^A-Za-z\s]', '')
If you have non-ASCII characters in your regular column names, you might need to adjust the regex. If you also need numbers you could add 0-9
to the regex.
Result:
COUNTRY ID | COUNTRY NAME | |
---|---|---|
0 | 10 | Greece |
1 | 10007 | Romania |