I want to plot the number of positions in a sliding window of 1000 and a step of 20 for each sample (A-D).
Interpretation:
1
: position exists;NA
: position does not exist.
I have tested a dozen tools in bash
, R
and other but I am looking for a Python
solution.
Your advice please.
#This is an example of my data:
window = 1000
step = 20
# Example of dataframe
POSITION A B C D
1250 1 1 1 1
1750 NA 1 NA 1
1786 1 NA 1 1
1812 1 1 1 1
1855 1 1 1 1
1896 1 NA 1 NA
2635 NA 1 1 1
1689 1 1 NA NA
3250 1 1 1 1
3655 1 NA 1 1
3589 NA 1 1 1
I am looking for some thing like this:
Any help will be appreciated!
CodePudding user response:
I would try using a rolling count, then slice by the step. After that you can stack the df in order to use the hue parameter of the seaborn lineplot.
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame({'POSITION': [1250,
1750,
1786,
1812,
1855,
1896,
2635,
1689,
3250,
3655,
3589],
'A': [1.0, np.nan, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0, np.nan],
'B': [1.0, 1.0, np.nan, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0, np.nan, 1.0],
'C': [1.0, np.nan, 1.0, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0],
'D': [1.0, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, np.nan, 1.0, 1.0, 1.0]})
window = 5
step = 2
df = df.set_index('POSITION').rolling(window).count().reset_index().iloc[::step, :]
df = df.melt(id_vars='POSITION', value_vars=['A','B','C','D'], value_name='polym', var_name='chromop')
sns.lineplot(data=df, x='POSITION',y='polym',hue='chromop')