Home > front end >  get unique labels of a list and convert to dataframe for plotting
get unique labels of a list and convert to dataframe for plotting

Time:11-05

i have a set of lists and i need the labels to be a single column in a dataframe and the values in label that is not in the value(after the merge) should be 0.

label = [0, 2, 1, 3, 4, 9, 8, 24, 86, 5] value = [103, 16, 10, 9, 5, 4, 4, 3, 3, 3]

label1 = [0, 2, 3, 1, 9, 15, 4, 6, 5, 23] value1 = [73, 14, 8, 7, 3, 3, 3, 2, 2, 2]

label2 = [0, 2, 1, 3, 9, 6, 10, 15, 4, 114] value2 = [48, 11, 7, 6, 3, 2, 2, 2, 2, 1]

i need to merge all of these to look like this:

label = [0, 2, 1, 3, 4, 9, 8, 15, 23, 10, 24, 86, 5, 114]
value = [103, 16, 10, 9, 5, 4, 4, 0, 0, 0, 3, 3, 3, 0]
value1 = [73, 14, 7, 8, 0, 3, 0, 3, 2, 0, 0, 0, 2, 0]
value2 = [48, 11, 7, 6, 2, 3, 0, 2, 0, 2, 0, 0, 0, 1]

the reason for this is so that i can plot the data on a seaborn scatterplot/pairplot but i do not mind if there is a way to plot the first sets of data on scatterplot/pairplot

CodePudding user response:

one way to do this is to use outer merge, which will merge columns of dataframe based on the column you choose. Below is the code. First convert the label-value pairs into dataframes, then merge the 3 dataframes. Finally, change the column names to what you want to see...

## Your data
label = [0, 2, 1, 3, 4, 9, 8, 24, 86, 5] 
value = [103, 16, 10, 9, 5, 4, 4, 3, 3, 3]
label1 = [0, 2, 3, 1, 9, 15, 4, 6, 5, 23] 
value1 = [73, 14, 8, 7, 3, 3, 3, 2, 2, 2]
label2 = [0, 2, 1, 3, 9, 6, 10, 15, 4, 114] 
value2 = [48, 11, 7, 6, 3, 2, 2, 2, 2, 1]

## Conver each list into dataframe
df=pd.DataFrame({'label':label, 'value':value})
df1=pd.DataFrame({'label':label1, 'value':value1})
df2=pd.DataFrame({'label':label2, 'value':value2})

## Merge the dataframes on label column
dfresult = pd.merge(df, df1, how="outer", on='label')
dfresult = pd.merge(dfresult, df2, how="outer", on='label').fillna(0)

## Rename columns and convert data to int
dfresult.rename(columns={'value':'value2', 'value_x':'value', 'value_y':'value1'}, inplace=True)
dfresult=dfresult.astype(int)
dfresult

Output

    label   value   value1  value2
0   0   103 73  48
1   2   16  14  11
2   1   10  7   7
3   3   9   8   6
4   4   5   3   2
5   9   4   3   3
6   8   4   0   0
7   24  3   0   0
8   86  3   0   0
9   5   3   2   0
10  15  0   3   2
11  6   0   2   2
12  23  0   2   0
13  10  0   0   2
14  114 0   0   1
  • Related