The dataset contains data about COVID-19 patients. It is in both EXCEL and CSV file formats, and contains several variables and over 7 thousand records (rows) which has made the problem extremely harder and very time consuming to solve manually. Below are the 4 most important variables (columns) needed in solving the problem; 1: id
for identifying each record (row), 2: day_at_hosp
for each day a patient remained admitted at the hospital, 3: sex
of patient, 4: death
for whether the patient eventually died or survived.
I want to create a new variable total_days_at_hosp
which should contain a total of days a patient remained admitted at hospital.
Old Table:
_______________________________________
| id | day_at_hosp | sex | death |
|_______|_____________|________|________|
| 1 | 0 | male | no |
| 2 | 1 | | |
| 3 | 2 | | |
| 4 | 0 | female | no |
| 5 | 1 | | |
| 6 | 0 | male | no |
| 7 | 0 | female | no |
| 8 | 0 | male | no |
| 9 | 1 | | |
| 10 | 2 | | |
| 11 | 3 | | |
| 12 | 4 | | |
| ... | ... | ... | ... |
| 7882 | 0 | female | no |
| 7883 | 1 | | |
|_______|_____________|________|________|
New Table:
I want to convert table above into table below:
____________________________________________
| id |total_days_at_hosp| sex | death |
|_______|__________________|________|________|
| 1 | 3 | male | no |
| 4 | 2 | male | yes |
| 6 | 1 | male | yes |
| 7 | 1 | female | no |
| 8 | 5 | male | no |
| ... | ... | ... | ... |
| 2565 | 2 | female | no |
|_______|__________________|________|________|
NOTE: the id
column is for every record entered, and multiple records were entered for each patient depending on how long a patient remained admitted at the hospital. The day_at_hosp
variable contains days: 0=initial day at hospital, 1=second day at hospital, ... , n=nth last day at hospital.
The record (row) where the variable (column) day_at_hosp
is 0 corresponds to all entries in other columns, if the record (row) for day_at_hosp
is *not 0, say 1,2,3, ...,5 then it belongs to the patient right above, and all the corresponding variables (columns) are left blank.
However the dataset I need should look like the table below.
It should include a new variable (column) called total_days_at_hosp
generated from the variable (column) day_at_hosp
. The new variable (column) total_days_at_hosp
is more useful in statistical tests to be conducted and will replace variable (column) day_at_hosp
, so that all blank rows can be deleted.
To move from old table to new table the needed program should do the following:
day_at_hosp ===> total_days_at_hosp
0
1 ---> 3
2
-------------------------------------
0 ---> 2
1
-------------------------------------
0 ---> 1
-------------------------------------
0 ---> 1
-------------------------------------
0
1
2 ---> 5
3
4
-------------------------------------
...
-------------------------------------
0 ---> 2
1
-------------------------------------
How can I achieve this?
CodePudding user response:
Another formula option without dummy value placed at end of the Old/New Table.
1] Create New Table by >>
- Copy and paste all Old Table data to a unused area
- Click "Autofilter"
- In "days_at_hospital" column select =0 value
- Copy and paste filter of admissions to New Table column F
- Delete all 0s in rows of Column G
Then,
2] In G2
, formula copied down :
=IF(F2="","",IF(F3="",MATCH(9^9,A:A) 1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))
Remark : If your "ID Column" is Text value, formula changed to :
=IF(F2="","",IF(F3="",MATCH("zzz",A:A) 1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))
CodePudding user response:
It is apparent that your data are sorted by patient, and that your desired table will be much 'shorter' - accordingly the starting point for this answer is to apply an AutoFilter
to your original data, setting the filter criterion to be days_at_hospital = 0
, and then copy this filter of admissions to column F:
after deleting the old column G data, the formula below can then be entered in cell G2
and copied down
=INDEX(B:B,MATCH(F3,A:A,0)-1) 1
to keep the formula simple the same dummy maximum value should be entered at both the end of the old and new tables.