I have the two following tables of data, one named data1, the other named data2. The left-hand column is a categorical variable and the right hand column is frequency I would like to rewrite these tables but where there are missing categories in the left-hand column I would like it to put in the correct missing category and then put a '0' in the right-hand frequency column.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
9 8
10 2
11 1
14 1
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
12 1
19 1
];
To be clearer I will explain with an example. In data1, 8 12 and 13 are missing in the left-hand column. I would like matlab to recreate this table but with 0 values for 8, 12 and 13 so it looks as follows. I would also like it to have additional empty categories after '14' because data2 is longer and has more categories. I have also included what data2 should look like with filled in values.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
8 0
9 8
10 2
11 1
12 0
13 0
14 1
15 0
16 0
17 0
18 0
19 0
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
11 0
12 1
13 0
14 0
15 0
16 0
17 0
18 0
19 1
];
I have a handful of datasets which generally all start with 1,2,3,4,5...etc but then they all have slightly different categories on the left-hand column, because where values are missing it just omits the category rather than putting 0. How do i write a code so that it automatically fills in any blanks with a 0. It would be good if the code could identify what the 'highest' number of categories is amongst all the datasets and then fill in blanks based on this.
my aim is to put together a grouped bar chart with data series that are all the same length.
CodePudding user response:
Feel free to change the maximum length of the array, this is a generic answer. The maximum length is max(data1(:,1))
, but you can compute this in any way, e.g. the maximum value of multiple arrays.
% make new data
new_data1=zeros(max(data1(:,1),2));
new_data(:,1)=1:max(data1(:,1));
% Fill data. You can do this in a loop if its easier for you to understand.
% in essence, it says: in all the data1(:,1) indices of new_data's second column, put data1(:,2)
new_data(data1(:,1),2)=data1(:,2);
CodePudding user response:
You can convert the datasets to a table and then use outerjoin. Then you can replace the NaNs with whatever you want using fillmissing.
table1 = array2table(data1);
table1.Properties.VariableNames = {'A', 'B'};
table2 = array2table(data2);
table2.Properties.VariableNames = {'A', 'B'};
newTable = outerjoin(table1, table2, 'LeftKeys', {'A'}, 'RightKeys', {'A'}, 'MergeKeys', true)
which produces:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 NaN 20
9 8 8
10 2 2
11 1 NaN
12 NaN 1
14 1 NaN
19 NaN 1
And then get your zeros with newTable2 = fillmissing(newTable, 'constant', 0)
, which prints:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 0 20
9 8 8
10 2 2
11 1 0
12 0 1
14 1 0
19 0 1