Home > Software design >  How to make data frame from two vectors in R?
How to make data frame from two vectors in R?

Time:10-03

I have two vectors here. One is all the data about population for various countries:

##   [1] "China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47"           
##   [2] "India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70"          
##   [3] "United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25"       
##   [4] "Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51"          
##   [5] "Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83"           
##   [6] "Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73"               
##   [7] "Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64"             
##   [8] "Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11"        
##   [9] "Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 "
##   [10] "Tokelau 1357 1.27  17 136 10   N.A. N.A. 0  0.00"                             
##   [11] "Holy See 801 0.25  2 2003 0   N.A. N.A. N.A. 0.00"            

The other vector is all the column names in the exact order corresponding to the country name and those numbers above:

##  [1] "Country(ordependency)" "Population(2020)"      "YearlyChange"         
##  [4] "NetChange"             "Density(P/Km²)"       "LandArea(Km²)"       
##  [7] "Migrants(net)"         "Fert.Rate"             "Med.Age"              
## [10] "UrbanPop%"             "WorldShare"

How do I make a dataframe that match the column names corresponding to the its data such like this:

head(population)

 Country (or dependency)  Population (2020)   Yearly Change    Net Change  Density (P/Km²)  ......             
1                  China         1439323776            0.39       5540090   ... ....
2                  India         1380004385            0.99      13586631   .......
3          United States          331002651            0.59       1937734   .......
4              Indonesia          273523615            1.07       2898047   .......
5               Pakistan          220892340            2.00       4327022   .......

Note: For the last two countries Tokelau and Holy See there are no "Migrants(net)" data.

TIA!

EDIT:

Some more samples are here:

##  [53] "Côte d'Ivoire 26378274 2.57  661730 83 318000 -8000 4.7 19 51  0.34" 
##  [86] "Czech Republic (Czechia) 10708981 0.18  19772 139 77240 22011 1.6 43 74  0.14"
##  [93] "United Arab Emirates 9890402 1.23  119873 118 83600 40000 1.4 33 86  0.13"   
##  [98] "Papua New Guinea 8947024 1.95  170915 20 452860 -800 3.6 22 13  0.11" 
## [135] "Bosnia and Herzegovina 3280819 -0.61  -20181 64 51000 -21585 1.3 43 52  0.04" 
## [230] "Saint Pierre & Miquelon 5794 -0.48  -28 25 230   N.A. N.A. 100  0.00" 

UPDATES: Here is the problem:

tail(population)

##           Country(ordependency) Population(2020) YearlyChange NetChange
## 230 Saint Pierre & Miquelon             5794        -0.48       -28
## 231                  Montserrat             4992         0.06         3
## 232            Falkland Islands             3480         3.05       103
## 233                        Niue             1626         0.68        11
## 234                     Tokelau             1357         1.27        17
## 235                    Holy See              801         0.25         2
##     Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate **Med.Age** **UrbanPop%**
## 230              25            230          N.A.      N.A.     100      0.00
## 231              50            100          N.A.      N.A.      10      0.00
## 232               0          12170          N.A.      N.A.      66      0.00
## 233               6            260          N.A.      N.A.      46      0.00
## 234             136             10          N.A.      N.A.       0      0.00
## 235            2003              0          N.A.      N.A.    N.A.      0.00
##     **WorldShare**
## 230         NA
## 231         NA
## 232         NA
## 233         NA
## 234         NA
## 235         NA

All the rows with 10 variables instead of 11 are here:

## [202] "Isle of Man 85033 0.53  449 149 570   N.A. N.A. 53  0.00"                     
## [203] "Andorra 77265 0.16  123 164 470   N.A. N.A. 88  0.00"                         
## [204] "Dominica 71986 0.25  178 96 750   N.A. N.A. 74  0.00"                         
## [205] "Cayman Islands 65722 1.19  774 274 240   N.A. N.A. 97  0.00"                  
## [206] "Bermuda 62278 -0.36  -228 1246 50   N.A. N.A. 97  0.00"                       
## [207] "Marshall Islands 59190 0.68  399 329 180   N.A. N.A. 70  0.00"                
## [208] "Northern Mariana Islands 57559 0.60  343 125 460   N.A. N.A. 88  0.00"        
## [209] "Greenland 56770 0.17  98 0 410450   N.A. N.A. 87  0.00"                       
## [210] "American Samoa 55191 -0.22  -121 276 200   N.A. N.A. 88  0.00"                
## [211] "Saint Kitts & Nevis 53199 0.71  376 205 260   N.A. N.A. 33  0.00"         
## [212] "Faeroe Islands 48863 0.38  185 35 1396   N.A. N.A. 43  0.00"                  
## [213] "Sint Maarten 42876 1.15  488 1261 34   N.A. N.A. 96  0.00"                    
## [214] "Monaco 39242 0.71  278 26337 1   N.A. N.A. N.A. 0.00"                         
## [215] "Turks and Caicos 38717 1.38  526 41 950   N.A. N.A. 89  0.00"                 
## [216] "Saint Martin 38666 1.75  664 730 53   N.A. N.A. 0  0.00"                      
## [217] "Liechtenstein 38128 0.29  109 238 160   N.A. N.A. 15  0.00"                   
## [218] "San Marino 33931 0.21  71 566 60   N.A. N.A. 97  0.00"                        
## [219] "Gibraltar 33691 -0.03  -10 3369 10   N.A. N.A. N.A. 0.00"                     
## [220] "British Virgin Islands 30231 0.67  201 202 150   N.A. N.A. 52  0.00"          
## [221] "Caribbean Netherlands 26223 0.94  244 80 328   N.A. N.A. 75  0.00"            
## [222] "Palau 18094 0.48  86 39 460   N.A. N.A. N.A. 0.00"                            
## [223] "Cook Islands 17564 0.09  16 73 240   N.A. N.A. 75  0.00"                      
## [224] "Anguilla 15003 0.90  134 167 90   N.A. N.A. N.A. 0.00"                        
## [225] "Tuvalu 11792 1.25  146 393 30   N.A. N.A. 62  0.00"                           
## [226] "Wallis & Futuna 11239 -1.69  -193 80 140   N.A. N.A. 0  0.00"             
## [227] "Nauru 10824 0.63  68 541 20   N.A. N.A. N.A. 0.00"                            
## [228] "Saint Barthelemy 9877 0.30  30 470 21   N.A. N.A. 0  0.00"                    
## [229] "Saint Helena 6077 0.30  18 16 390   N.A. N.A. 27  0.00"                       
## [230] "Saint Pierre & Miquelon 5794 -0.48  -28 25 230   N.A. N.A. 100  0.00"     
## [231] "Montserrat 4992 0.06  3 50 100   N.A. N.A. 10  0.00"                          
## [232] "Falkland Islands 3480 3.05  103 0 12170   N.A. N.A. 66  0.00"                 
## [233] "Niue 1626 0.68  11 6 260   N.A. N.A. 46  0.00"                                
## [234] "Tokelau 1357 1.27  17 136 10   N.A. N.A. 0  0.00"                             
## [235] "Holy See 801 0.25  2 2003 0   N.A. N.A. N.A. 0.00"

CodePudding user response:

It would be easier to read with read.table with delimiter space. But, there is an issue with space as the 'Country' may have multiple words and this should be read as a single column. In order to do that, we can insert single quotes as boundary for the Country using sub and then read with read.table while specifying the col.names as 'v2'

df1 <- read.table(text = sub("^([^0-9] )\\s", ' "\\1"', v1), 
   header = FALSE, col.names = v2, fill = TRUE, check.names = FALSE)

-output

df1
      Country(ordependency) Population(2020) YearlyChange NetChange Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate Med.Age UrbanPop%
1                        China       1439323776         0.39   5540090             153        9388211       -348399       1.7      38        61
2                        India       1380004385         0.99  13586631             464        2973190       -532687       2.2      28        35
3                United States        331002651         0.59   1937734              36        9147420        954806       1.8      38        83
4                    Indonesia        273523615         1.07   2898047             151        1811570        -98955       2.3      30        56
5                     Pakistan        220892340         2.00   4327022             287         770880       -233379       3.6      23        35
6                       Brazil        212559417         0.72   1509890              25        8358140         21200       1.7      33        88
7                      Nigeria        206139589         2.58   5175990             226         910770        -60000       5.4      18        52
8                   Bangladesh        164689383         1.01   1643222            1265         130170       -369501       2.1      28        39
9                       Russia        145934462         0.04     62206               9       16376870        182456       1.8      40        74
10                     Tokelau             1357         1.27        17             136             10          N.A.      N.A.       0         0
11                    Holy See              801         0.25         2            2003              0          N.A.      N.A.    N.A.         0
12         C&ocirc;te d'Ivoire         26378274         2.57    661730              83         318000         -8000       4.7      19        51
13    Czech Republic (Czechia)         10708981         0.18     19772             139          77240         22011       1.6      43        74
14        United Arab Emirates          9890402         1.23    119873             118          83600         40000       1.4      33        86
15            Papua New Guinea          8947024         1.95    170915              20         452860          -800       3.6      22        13
16      Bosnia and Herzegovina          3280819        -0.61    -20181              64          51000        -21585       1.3      43        52
17 Saint Pierre &amp; Miquelon             5794        -0.48       -28              25            230          N.A.      N.A.     100         0
   WorldShare
1       18.47
2       17.70
3        4.25
4        3.51
5        2.83
6        2.73
7        2.64
8        2.11
9        1.87
10         NA
11         NA
12       0.34
13       0.14
14       0.13
15       0.11
16       0.04
17         NA

For those cases where the count is less, we can update the column values by shifting the columns values with row/column indexing

library(stringr)
cnt <-  str_count(sub("^([^0-9] )\\s", '', v1), "\\s ")   2
i1 <- cnt == 10
df1[i1, 10:11] <- df1[i1, 9:10]
df1[i1, 9] <- NA

data

v1 <- c("China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47", 
"India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70", 
"United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25", 
"Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51", 
"Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83", 
"Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73", 
"Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64", 
"Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11", 
"Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 ", 
"Tokelau 1357 1.27  17 136 10   N.A. N.A. 0  0.00", "Holy See 801 0.25  2 2003 0   N.A. N.A. N.A. 0.00", 
"C&ocirc;te d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34", 
"Czech Republic (Czechia) 10708981 0.18  19772 139 77240 22011 1.6 43 74  0.14", 
"United Arab Emirates 9890402 1.23  119873 118 83600 40000 1.4 33 86  0.13", 
"Papua New Guinea 8947024 1.95  170915 20 452860 -800 3.6 22 13  0.11", 
"Bosnia and Herzegovina 3280819 -0.61  -20181 64 51000 -21585 1.3 43 52  0.04", 
"Saint Pierre &amp; Miquelon 5794 -0.48  -28 25 230   N.A. N.A. 100  0.00"
)


v2 <- c("Country(ordependency)", "Population(2020)", "YearlyChange", 
"NetChange", "Density(P/Km²)", "LandArea(Km²)", "Migrants(net)", 
"Fert.Rate", "Med.Age", "UrbanPop%", "WorldShare")

CodePudding user response:

I am not sure what you mean but you could try:

 df <- do.call(rbind.data.frame, vector1)
 colnames(df) <- vector2
  • Related