I seek some help to plot a heat map which should look like this:
Sorry in advance, I could have come up with some code, but I have no idea how to begin. I know this is not the way Stackoverflow works, but I am unable to begin.
The data set that one can use along x axis is an array of years from 1975 to 2018 [1975,.....2018]
For y axis: An array of month [January to December]
For x-y intersection values, as shown in image, one can use 1 or 2 or 3
In the image I added, cross signs represent data gaps and white spaces represent zero(0) values.
Thankyou in advance.
UPDATE:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('Events_in_Month_and_Year.xlsx',encoding = 'unicode_escape',error_bad_lines=False
)
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months] # change the order of the rows to be the same as months
for _ in range(20):
# set some random locations to "not filled in"
pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val 1), vmin=-0.5, vmax=max_val 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black') # will be used for hatching
ax.patch.set_hatch('xxxx')
spines = ax.collections[0].colorbar.ax.spines
for s in spines:
spines[s].set_visible(True) # show border around colorbar
plt.tight_layout()
plt.show()
I have tried this code. But getting error
Error tokenizing data. C error: Buffer overflow caught - possible malformed input file
My data is stored in a .xlsx file which looks like this
CodePudding user response:
You can use
aPS: If you have your original data e.g. in Excel, you can save them as a csv file and load them with df = pd.read_csv(filename)
.
The code for a file similar to the one in the post, could look like the following. To make a difference between 0
and a "data gap", missing data could be represented in the Excel file with an empty cell.
Empty rows for missing years can be added via assigning a new index.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# read the dataframe from a .csv file
pivoted = pd.read_csv('test.csv', index_col=0) # maybe: delimiter=';'
# extend the index to include all intermediate years
pivoted = pd.DataFrame(pivoted, index=range(pivoted.index.min(), pivoted.index.max() 1))
# exchange columns and rows
pivoted = pivoted.T
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val 1), vmin=-0.5, vmax=max_val 0.5,
cbar_kws={'ticks': np.arange(max_val 1)})
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black') # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
ax.collections[0].colorbar.outline.set_edgecolor('black')
plt.tight_layout()
plt.show()