I am using MATLAB 2021b and I have the following data:
ID = {'a','a','a','a','b','b','b','b'}';
DATE = [2010,2010,2011,2011,2011,2011,2012,2012]';
FIELD_ID = {'f1','f2','f1','f2','f1','f2','f1','f2'}';
VALUE = [1,2,5,6,1,1,7,8]';
T_before = table(ID,DATE,FIELD_ID,VALUE);
T_before =
8×4 table
ID DATE FIELD_ID VALUE
_____ ____ ________ _____
{'a'} 2010 {'f1'} 1
{'a'} 2010 {'f2'} 2
{'a'} 2011 {'f1'} 5
{'a'} 2011 {'f2'} 6
{'b'} 2011 {'f1'} 1
{'b'} 2011 {'f2'} 1
{'b'} 2012 {'f1'} 7
{'b'} 2012 {'f2'} 8
In reality the table is a lot longer and contains more fields. The latest DATE for a given ID can be different. The column FIELD_ID contains fields and their respective value is in the column VALUE.
What I would like to do is unstack this table in long format to have one row per ID with the fields as colums. I have one condition, I want only to unstack the rows containing the latest value in the field DATE
. It should look like the following:
T_after =
2×4 table
ID DATE f1 f2
_____ ____ __ __
{'a'} 2011 5 6
{'b'} 2012 7 8
CodePudding user response:
One for
loop can do this quickly.
ID = {'a','a','a','a','b','b','b','b'}';
DATE = [2010,2010,2011,2011,2011,2011,2012,2012]';
FIELD_ID = {'f1','f2','f1','f2','f1','f2','f1','f2'}';
VALUE = [1,2,5,6,1,1,7,8]';
T_before = table(ID,DATE,FIELD_ID,VALUE);
% get unique IDs
id_unique = unique(ID);
n = numel(id_unique);
f1 = cell(1,n);
f2 = f1;
dates = f1;
for i = 1:n
%filter DATE by unique ID, then get the latest date
filter_id = ismember(ID, id_unique{i});
date_filtered = DATE(filter_id);
dates{i} = max(date_filtered);
%filter FIELD_ID and VALUE by unique ID and dates
filter_date = DATE == dates{i};
filter = filter_id & filter_date;
field_id_filtered = FIELD_ID(filter);
value_filtered = VALUE(filter);
%find F1 and F2
if strcmp(field_id_filtered{1}, 'f1')
f1{i} = value_filtered(1);
f2{i} = value_filtered(2);
else
f1{i} = value_filtered(2);
f2{i} = value_filtered(1);
end
end
%make the final table
T = table(id_unique(:),dates(:),f1(:),f2(:), 'VariableNames',{'ID','DATE','f1','f2'});