Home > Software design >  How to do a manipulations with datasets for which the name is listed in a vector?
How to do a manipulations with datasets for which the name is listed in a vector?

Time:05-28

I'd like to do several manipulations with datasets that are in-built in R from the packages that I have. So, first, I made a vector with dataset's names, but when I tried to filter the datasets which have only one column, I got an error, saying that the length of the argument is 0. Here is the code:

for (i in datasets){
  if (ncol(i)==1){dataset <- i datasets <- c(dataset, datasets) }
}

It treats the names of the datasets as a character vector. Here is the head of the aforementioned vector: [1] ability.cov airmiles AirPassengers airquality anscombe attenu. It's silly, but how could I treat the entries as dataframes?

CodePudding user response:

I don't fully understand your logic, but based on your code, you want to identify which dataset that has one column by using ncol(x) == 1. If that's true, then you need to deal with some issues:

  1. the various structures of the datasets. ncol produces the number of columns on data.frame and matrix but does not on time-series. For example: ncol(anscombe) results in 8 but ncol(AirPassengers) results in NULL. If you decide to use ncol, then you need to coerce each dataset to a data.frame by using as.data.frame.
  2. indexing the character vector of the names of the datasets. You need to call a dataset, not its character name, to be able to use as.data.frame. One way of doing this is by using eval(parse(text=the_name)).
  3. the way to store the result. You can use c() to combine the results but the datasets will be converted to vectors, no longer in their initial structures. I recommend using list to preserve the data frame structures of the datasets.

Here is one possible solution based on those considerations:

datasets <- c("ability.cov", "airmiles", "AirPassengers", "airquality", "anscombe", "attenu")

single_col_datasets <- vector('list', 1)
for (i in seq_along(datasets)){
    if (ncol(as.data.frame(eval(parse(text = datasets[i])))) == 1){
        single_col_datasets[[i]] <- as.data.frame(eval(parse(text = datasets[i])))
        names(single_col_datasets[[i]]) <- datasets[i]
    }
    not.null.element <- single_col_datasets[lengths(single_col_datasets) != 0]
    new.datasets <- list(not.null.element, datasets) 
}

Here is the result:

new.datasets
[[1]]
[[1]][[1]]
   airmiles
1       412
2       480
3       683
4      1052
5      1385
6      1418
7      1634
8      2178
9      3362
10     5948
11     6109
12     5981
13     6753
14     8003
15    10566
16    12528
17    14760
18    16769
19    19819
20    22362
21    25340
22    25343
23    29269
24    30514

[[1]][[2]]
    AirPassengers
1             112
2             118
3             132
4             129
5             121
6             135
7             148
8             148
9             136
10            119
11            104
12            118
13            115
14            126
15            141
16            135
17            125
18            149
19            170
20            170
21            158
22            133
23            114
24            140
25            145
26            150
27            178
28            163
29            172
30            178
31            199
32            199
33            184
34            162
35            146
36            166
37            171
38            180
39            193
40            181
41            183
42            218
43            230
44            242
45            209
46            191
47            172
48            194
49            196
50            196
51            236
52            235
53            229
54            243
55            264
56            272
57            237
58            211
59            180
60            201
61            204
62            188
63            235
64            227
65            234
66            264
67            302
68            293
69            259
70            229
71            203
72            229
73            242
74            233
75            267
76            269
77            270
78            315
79            364
80            347
81            312
82            274
83            237
84            278
85            284
86            277
87            317
88            313
89            318
90            374
91            413
92            405
93            355
94            306
95            271
96            306
97            315
98            301
99            356
100           348
101           355
102           422
103           465
104           467
105           404
106           347
107           305
108           336
109           340
110           318
111           362
112           348
113           363
114           435
115           491
116           505
117           404
118           359
119           310
120           337
121           360
122           342
123           406
124           396
125           420
126           472
127           548
128           559
129           463
130           407
131           362
132           405
133           417
134           391
135           419
136           461
137           472
138           535
139           622
140           606
141           508
142           461
143           390
144           432


[[2]]
[1] "ability.cov"   "airmiles"      "AirPassengers" "airquality"    "anscombe"      "attenu"  

CodePudding user response:

You can use the get function:

for (i in datasets){
  if (ncol(get(i))==1){
    dataset <- i 
    datasets <- c(dataset, datasets)
  }
}
  •  Tags:  
  • r
  • Related