Home > Back-end >  How to remove underscores from column names within plots?
How to remove underscores from column names within plots?

Time:12-11

I'm working with data frames that have underscores as spaces in the column names. As I understand it, this is good practice, so I don't want to rename the columns with spaces.

When plotting a correlation of the various columns with each other, the column names are spelled out with the underscore in the plot. I'd like to be able to have a cleaner reading visual that uses spaces instead.

Is there a way to modify the names that are displayed within plots such that I can replace all underscores with spaces and/or rename the labels displayed in these charts to be something different/more clear than the actual column names? Using both matplotlib and seaborn for my plots at the moment.

CodePudding user response:

I find <your_col_name_here>.replace("_", " ").title() very helpful for this.

Here are some examples...

Example 1 - Matplotlib:

def show_iris_histograms():
    cols_to_plot = [
        "sepal_length",
        "sepal_width",
        "petal_length",
        "petal_width",
]
    fig = plt.figure(figsize=(10, 6))
    for i, col_name in enumerate(cols_to_plot):
        ax = fig.add_subplot(2, 2, i   1)
        iris[col_name].hist(bins=15, ax=ax)
        col_name_label = col_name.replace("_", " ").title()
        ax.set_title(col_name_label   " Distribution")
    fig.tight_layout()
    plt.show()

show_iris_histograms()

Output: enter image description here

Example 2 - Seaborn:

def show_iris_pairplots():
    pair_plot = sns.pairplot(iris, hue="class")
    pair_plot.fig.get_children()[-1].set_title("Class")
    for i in range(4):
        for j in range(4):
            x_label = pair_plot.axes[i][j].get_xlabel()
            y_label = pair_plot.axes[i][j].get_ylabel()
            pair_plot.axes[i][j].set_xlabel(
                x_label.replace("_", " ").title(),
            )
            pair_plot.axes[i][j].set_ylabel(
                y_label.replace("_", " ").title(),
            )
plt.show()

show_iris_pairplots()

Output: enter image description here

CodePudding user response:

If you are passing the entire data frame to the plotting method

Use pandas rename method together with Python's string replace to replace underscores with spaces.

One way is to create an auxiliary data frame with the modified columns names and pass that new data frame to the plotting method, e.g.:

import pandas as pd

# Dummy df
df = pd.DataFrame({
    'Column_1': pd.Series([1, 2, 3]), 
    'Column_2': pd.Series([1, 2, 3]),
    'Column_3': pd.Series([1, 2, 3]),
})

modified_df = df.rename(columns=lambda name: name.replace('_', ' '))

print(df)
print(modified_df)

Outputs:

   Column_1  Column_2  Column_3
0         1         1         1
1         2         2         2
2         3         3         3
   Column 1  Column 2  Column 3
0         1         1         1
1         2         2         2
2         3         3         3

If you don't care changing the original data frame, you can make this operation inplace, without the need of creating the auxiliary data frame:

df.rename(columns=lambda name: name.replace('_', ' '), inplace=True)

If you just need a list with the modified labels

You can generate the new labels using list comprehension:

labels = [col.replace('_', ' ') for col in df.columns]
  • Related