Home > OS >  How to generate new variable (column) based on a specific given column containing cycles of numbers
How to generate new variable (column) based on a specific given column containing cycles of numbers

Time:05-23

The dataset contains data about COVID-19 patients. It is in both EXCEL and CSV file formats, and contains several variables and over 7 thousand records (rows) which has made the problem extremely harder and very time consuming to solve manually. Below are the 4 most important variables (columns) needed in solving the problem; 1: id for identifying each record (row), 2: day_at_hosp for each day a patient remained admitted at the hospital, 3: sex of patient, 4: death for whether the patient eventually died or survived. I want to create a new variable total_days_at_hosp which should contain a total of days a patient remained admitted at hospital.

Old Table:
 _______________________________________
|   id  | day_at_hosp |  sex   | death  |
|_______|_____________|________|________|
|   1   |     0       |  male  |   no   |
|   2   |     1       |        |        |
|   3   |     2       |        |        |
|   4   |     0       | female |   no   |
|   5   |     1       |        |        |
|   6   |     0       |  male  |   no   |
|   7   |     0       | female |   no   |
|   8   |     0       |  male  |   no   |
|   9   |     1       |        |        |
|  10   |     2       |        |        |
|  11   |     3       |        |        |
|  12   |     4       |        |        |
| ...   |    ...      |   ...  |  ...   |
| 7882  |     0       | female |   no   |
| 7883  |     1       |        |        |
|_______|_____________|________|________|

New Table:
I want to convert table above into table below:
 ____________________________________________
|   id  |total_days_at_hosp|  sex   | death  |
|_______|__________________|________|________|
|   1   |        3         |  male  |   no   |
|   4   |        2         |  male  |   yes  |
|   6   |        1         |  male  |   yes  |
|   7   |        1         | female |   no   |
|   8   |        5         |  male  |   no   |
| ...   |       ...        |   ...  |  ...   |
| 2565  |        2         | female |   no   |
|_______|__________________|________|________|

NOTE: the id column is for every record entered, and multiple records were entered for each patient depending on how long a patient remained admitted at the hospital. The day_at_hosp variable contains days: 0=initial day at hospital, 1=second day at hospital, ... , n=nth last day at hospital. The record (row) where the variable (column) day_at_hosp is 0 corresponds to all entries in other columns, if the record (row) for day_at_hosp is *not 0, say 1,2,3, ...,5 then it belongs to the patient right above, and all the corresponding variables (columns) are left blank. However the dataset I need should look like the table below. It should include a new variable (column) called total_days_at_hosp generated from the variable (column) day_at_hosp. The new variable (column) total_days_at_hosp is more useful in statistical tests to be conducted and will replace variable (column) day_at_hosp, so that all blank rows can be deleted. To move from old table to new table the needed program should do the following:

day_at_hosp ===> total_days_at_hosp
    0                                 
    1        --->        3               
    2
-------------------------------------                              
    0        --->        2           
    1                              
-------------------------------------
    0        --->        1          
-------------------------------------
    0        --->        1
-------------------------------------
    0                                  
    1                                  
    2        --->        5                
    3
    4                                  
-------------------------------------
   ...                                 
------------------------------------- 
    0         --->       2                                              
    1
-------------------------------------

How can I achieve this?

CodePudding user response:

Another formula option without dummy value placed at end of the Old/New Table.

1] Create New Table by >>

  • Copy and paste all Old Table data to a unused area
  • Click "Autofilter"
  • In "days_at_hospital" column select =0 value
  • Copy and paste filter of admissions to New Table column F
  • Delete all 0s in rows of Column G

Then,

2] In G2, formula copied down :

=IF(F2="","",IF(F3="",MATCH(9^9,A:A) 1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))

Remark : If your "ID Column" is Text value, formula changed to :

=IF(F2="","",IF(F3="",MATCH("zzz",A:A) 1,MATCH(F3,A:A,0))-MATCH(F2,A:A,0))

enter image description here

CodePudding user response:

It is apparent that your data are sorted by patient, and that your desired table will be much 'shorter' - accordingly the starting point for this answer is to apply an AutoFilter to your original data, setting the filter criterion to be days_at_hospital = 0, and then copy this filter of admissions to column F: enter image description here after deleting the old column G data, the formula below can then be entered in cell G2 and copied down

=INDEX(B:B,MATCH(F3,A:A,0)-1) 1

to keep the formula simple the same dummy maximum value should be entered at both the end of the old and new tables.

  • Related