I have two vectors here. One is all the data about population for various countries:
## [1] "China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47"
## [2] "India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70"
## [3] "United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25"
## [4] "Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51"
## [5] "Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83"
## [6] "Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73"
## [7] "Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64"
## [8] "Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11"
## [9] "Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 "
## [10] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [11] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"
The other vector is all the column names in the exact order corresponding to the country name and those numbers above:
## [1] "Country(ordependency)" "Population(2020)" "YearlyChange"
## [4] "NetChange" "Density(P/Km²)" "LandArea(Km²)"
## [7] "Migrants(net)" "Fert.Rate" "Med.Age"
## [10] "UrbanPop%" "WorldShare"
How do I make a dataframe that match the column names corresponding to the its data such like this:
head(population)
Country (or dependency) Population (2020) Yearly Change Net Change Density (P/Km²) ......
1 China 1439323776 0.39 5540090 ... ....
2 India 1380004385 0.99 13586631 .......
3 United States 331002651 0.59 1937734 .......
4 Indonesia 273523615 1.07 2898047 .......
5 Pakistan 220892340 2.00 4327022 .......
Note: For the last two countries Tokelau and Holy See there are no "Migrants(net)" data.
TIA!
EDIT:
Some more samples are here:
## [53] "Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34"
## [86] "Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14"
## [93] "United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13"
## [98] "Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11"
## [135] "Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
UPDATES: Here is the problem:
tail(population)
## Country(ordependency) Population(2020) YearlyChange NetChange
## 230 Saint Pierre & Miquelon 5794 -0.48 -28
## 231 Montserrat 4992 0.06 3
## 232 Falkland Islands 3480 3.05 103
## 233 Niue 1626 0.68 11
## 234 Tokelau 1357 1.27 17
## 235 Holy See 801 0.25 2
## Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate **Med.Age** **UrbanPop%**
## 230 25 230 N.A. N.A. 100 0.00
## 231 50 100 N.A. N.A. 10 0.00
## 232 0 12170 N.A. N.A. 66 0.00
## 233 6 260 N.A. N.A. 46 0.00
## 234 136 10 N.A. N.A. 0 0.00
## 235 2003 0 N.A. N.A. N.A. 0.00
## **WorldShare**
## 230 NA
## 231 NA
## 232 NA
## 233 NA
## 234 NA
## 235 NA
All the rows with 10 variables instead of 11 are here:
## [202] "Isle of Man 85033 0.53 449 149 570 N.A. N.A. 53 0.00"
## [203] "Andorra 77265 0.16 123 164 470 N.A. N.A. 88 0.00"
## [204] "Dominica 71986 0.25 178 96 750 N.A. N.A. 74 0.00"
## [205] "Cayman Islands 65722 1.19 774 274 240 N.A. N.A. 97 0.00"
## [206] "Bermuda 62278 -0.36 -228 1246 50 N.A. N.A. 97 0.00"
## [207] "Marshall Islands 59190 0.68 399 329 180 N.A. N.A. 70 0.00"
## [208] "Northern Mariana Islands 57559 0.60 343 125 460 N.A. N.A. 88 0.00"
## [209] "Greenland 56770 0.17 98 0 410450 N.A. N.A. 87 0.00"
## [210] "American Samoa 55191 -0.22 -121 276 200 N.A. N.A. 88 0.00"
## [211] "Saint Kitts & Nevis 53199 0.71 376 205 260 N.A. N.A. 33 0.00"
## [212] "Faeroe Islands 48863 0.38 185 35 1396 N.A. N.A. 43 0.00"
## [213] "Sint Maarten 42876 1.15 488 1261 34 N.A. N.A. 96 0.00"
## [214] "Monaco 39242 0.71 278 26337 1 N.A. N.A. N.A. 0.00"
## [215] "Turks and Caicos 38717 1.38 526 41 950 N.A. N.A. 89 0.00"
## [216] "Saint Martin 38666 1.75 664 730 53 N.A. N.A. 0 0.00"
## [217] "Liechtenstein 38128 0.29 109 238 160 N.A. N.A. 15 0.00"
## [218] "San Marino 33931 0.21 71 566 60 N.A. N.A. 97 0.00"
## [219] "Gibraltar 33691 -0.03 -10 3369 10 N.A. N.A. N.A. 0.00"
## [220] "British Virgin Islands 30231 0.67 201 202 150 N.A. N.A. 52 0.00"
## [221] "Caribbean Netherlands 26223 0.94 244 80 328 N.A. N.A. 75 0.00"
## [222] "Palau 18094 0.48 86 39 460 N.A. N.A. N.A. 0.00"
## [223] "Cook Islands 17564 0.09 16 73 240 N.A. N.A. 75 0.00"
## [224] "Anguilla 15003 0.90 134 167 90 N.A. N.A. N.A. 0.00"
## [225] "Tuvalu 11792 1.25 146 393 30 N.A. N.A. 62 0.00"
## [226] "Wallis & Futuna 11239 -1.69 -193 80 140 N.A. N.A. 0 0.00"
## [227] "Nauru 10824 0.63 68 541 20 N.A. N.A. N.A. 0.00"
## [228] "Saint Barthelemy 9877 0.30 30 470 21 N.A. N.A. 0 0.00"
## [229] "Saint Helena 6077 0.30 18 16 390 N.A. N.A. 27 0.00"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
## [231] "Montserrat 4992 0.06 3 50 100 N.A. N.A. 10 0.00"
## [232] "Falkland Islands 3480 3.05 103 0 12170 N.A. N.A. 66 0.00"
## [233] "Niue 1626 0.68 11 6 260 N.A. N.A. 46 0.00"
## [234] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [235] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"
CodePudding user response:
It would be easier to read with read.table
with delimiter space. But, there is an issue with space as the 'Country' may have multiple words and this should be read as a single column. In order to do that, we can insert single quotes as boundary for the Country using sub
and then read with read.table
while specifying the col.names
as 'v2'
df1 <- read.table(text = sub("^([^0-9] )\\s", ' "\\1"', v1),
header = FALSE, col.names = v2, fill = TRUE, check.names = FALSE)
-output
df1
Country(ordependency) Population(2020) YearlyChange NetChange Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate Med.Age UrbanPop%
1 China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61
2 India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35
3 United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83
4 Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56
5 Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35
6 Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88
7 Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52
8 Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39
9 Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74
10 Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0
11 Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0
12 Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51
13 Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74
14 United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86
15 Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13
16 Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52
17 Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0
WorldShare
1 18.47
2 17.70
3 4.25
4 3.51
5 2.83
6 2.73
7 2.64
8 2.11
9 1.87
10 NA
11 NA
12 0.34
13 0.14
14 0.13
15 0.11
16 0.04
17 NA
For those cases where the count is less, we can update the column values by shifting the columns values with row/column
indexing
library(stringr)
cnt <- str_count(sub("^([^0-9] )\\s", '', v1), "\\s ") 2
i1 <- cnt == 10
df1[i1, 10:11] <- df1[i1, 9:10]
df1[i1, 9] <- NA
data
v1 <- c("China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47",
"India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70",
"United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25",
"Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51",
"Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83",
"Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73",
"Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64",
"Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11",
"Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 ",
"Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00", "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00",
"Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34",
"Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14",
"United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13",
"Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11",
"Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04",
"Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
)
v2 <- c("Country(ordependency)", "Population(2020)", "YearlyChange",
"NetChange", "Density(P/Km²)", "LandArea(Km²)", "Migrants(net)",
"Fert.Rate", "Med.Age", "UrbanPop%", "WorldShare")
CodePudding user response:
I am not sure what you mean but you could try:
df <- do.call(rbind.data.frame, vector1)
colnames(df) <- vector2