We have the following column names:
our_df.columns
Index(['Rank', 'Album', 'Artist', 'Label', 'Label Description',
'Peak Position', 'Last Week Rank', 'Last 2 Week Rank', 'Weeks On Chart',
'TW Total Activity', '% CHG', 'LW Total Activity', 'TW Album Sales',
'TW Song Sales', 'TW TEA', 'TW Audio Streaming Activity',
'TW Video Streaming Activity', 'TW Total SEA (Audio Video)', 'ATD'],
dtype='object')
And we'd like to do something like this:
our_df.columns.str.replace(' ','_').replace('.', '').replace(' /-','plus_minus').lower()
...where we first make all of the replacements, and then convert everything to lowercase. However, this is failing with the error 'Index' object has no attribute 'replace'
. We've updated this to
our_df.columns.str.replace(' ','_').str.replace('.', '')
# and get the warning
FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
lastly, when we try
our_df.columns.str.replace(' ','_').str.replace('.', '').str.replace(' /-','plus_minus').str.lower()
...we get the error nothing to repeat at position 0
.
- Is it necessary to repeat
str.
between each.replace()
call? - How can we resolve future warning?
- The last error is caused by
.replace(' /-','plus_minus')
because there is no/-
in the columns. How can we handle this so that rather than throwing an error, it simply makes no replacements? - Am I doing this right?
CodePudding user response:
If want use str.replace
need repeat it, for
and .
is necessary escape \
because regex special character
s and add regex=True
for remove future warning
:
(our_df.columns.str.replace(' ','_', regex=True)
.str.replace('\.', '', regex=True)
.str.replace('\ /-','plus_minus', regex=True)
.str.lower())
Or if pass dictionary convert values to Series
, because error:
AttributeError: 'Index' object has no attribute 'replace'
c = ['Rank', 'Album', 'Artist', 'Label', 'Label Description',
'Peak Position', 'Last Week Rank', 'Last 2 Week Rank', 'Weeks On Chart',
'TW Total Activity', '% CHG', 'LW Total Activity', 'TW Album Sales',
'TW Song Sales', 'TW TEA', 'TW Audio Streaming Activity',
'TW Video Streaming Activity', 'TW Total SEA (Audio /- Video)', 'ATD']
our_df = pd.DataFrame(columns=c)
our_df.columns = our_df.columns.to_series().replace({'\s ':'_','\.': '','\ /-':'plus_minus'}, regex=True).str.lower()
print (our_df)
Empty DataFrame
Columns: [rank, album, artist, label, label_description, peak_position, last_week_rank, last_2_week_rank, weeks_on_chart, tw_total_activity, %_chg, lw_total_activity, tw_album_sales, tw_song_sales, tw_tea, tw_audio_streaming_activity, tw_video_streaming_activity, tw_total_sea_(audio_plus_minus_video), atd]
Index: []