I am very new to R so please excuse any stupid questions.
I have a dataframe which consists of multiple columns, lets for argument sake say A
through E
, so A, B, C, D, E
I want to transpose these columns to create one new column called Variable
which will hold the values of A, B, C, D, E
all populating down in a new dataframe
Variable
A
B
C
D
E
Is this possible as nothing I have tried works such as MELT or t(x) etc
So I have:
I need:
Just to say also, the variables will greatly differ from one file to another, they will not always be the same names, so need something like allvars etc rather than naming them ?
Thanks
Colm
CodePudding user response:
You can use stack
to Stack Vectors from a Data Frame.
stack(x)
# values ind
#1 1 A
#2 2 A
#3 3 A
#4 4 B
#5 5 B
#6 6 B
#7 7 C
#8 8 C
#9 9 C
Or do you want something like?
data.frame(variable = names(x))
# variable
#1 A
#2 B
#3 C
Data:
x <- data.frame(A = 1:3, B=4:6, C=7:9)
CodePudding user response:
Here is a update with base R. I think I now understand what you want.
- Create a dataframe
df
- Then use
colnames()
function to store them in a new dataframe with the column calledVariable
:
Update:
df <- data.frame(VARIABLE1=character(),
VARIABLE2=character(),
VARIABLE3=character(),
VARIABLE4=character(),
VARIABLE5=character(),
stringsAsFactors=FALSE)
df_result <- data.frame(Variable = c(colnames(df)))
df_result
Variable
1 VARIABLE1
2 VARIABLE2
3 VARIABLE3
4 VARIABLE4
5 VARIABLE5
First answer:
library(tidyverse)
df <- tribble(
~A, ~B, ~C, ~D, ~E,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)
df %>%
pivot_longer(
everything()
) %>%
select(1)
name
<chr>
1 A
2 B
3 C
4 D
5 E
CodePudding user response:
If I understand you correctly, you can use the following code. First, create fake data:
df <- data.frame(A = runif(10, 0, 1),
B = runif(10, 0, 1),
C = runif(10, 0, 1),
D = runif(10, 0, 1),
E = runif(10, 0, 1))
df
Output:
A B C D E
1 0.3711955 0.60529893 0.4139218 0.38474364 0.33032247
2 0.5689305 0.08455396 0.4547466 0.50095761 0.44413510
3 0.3220088 0.42268926 0.3171933 0.30313125 0.09273389
4 0.3734473 0.90371386 0.8634857 0.66372492 0.42984325
5 0.5363005 0.64373088 0.2279042 0.11793754 0.12238873
6 0.2101622 0.64878858 0.0329526 0.01596763 0.26302076
7 0.9438709 0.84500107 0.5446587 0.27072456 0.57295920
8 0.4185947 0.33992763 0.4755329 0.36225772 0.47469022
9 0.7987066 0.84397516 0.3679846 0.87131562 0.95563427
10 0.7866157 0.46601434 0.5662175 0.66513184 0.29728068
Next use melt
from the reshape
package like this:
library(reshape)
melt(df)
Output:
Using as id variables
variable value
1 A 0.37119552
2 A 0.56893045
3 A 0.32200884
4 A 0.37344734
5 A 0.53630053
6 A 0.21016220
7 A 0.94387092
8 A 0.41859472
9 A 0.79870656
10 A 0.78661568
11 B 0.60529893
12 B 0.08455396
13 B 0.42268926
14 B 0.90371386
15 B 0.64373088
16 B 0.64878858
17 B 0.84500107
18 B 0.33992763
19 B 0.84397516
20 B 0.46601434
21 C 0.41392181
22 C 0.45474660
23 C 0.31719334
24 C 0.86348573
25 C 0.22790422
26 C 0.03295260
27 C 0.54465868
28 C 0.47553287
29 C 0.36798461
30 C 0.56621750
31 D 0.38474364
32 D 0.50095761
33 D 0.30313125
34 D 0.66372492
35 D 0.11793754
36 D 0.01596763
37 D 0.27072456
38 D 0.36225772
39 D 0.87131562
40 D 0.66513184
41 E 0.33032247
42 E 0.44413510
43 E 0.09273389
44 E 0.42984325
45 E 0.12238873
46 E 0.26302076
47 E 0.57295920
48 E 0.47469022
49 E 0.95563427
50 E 0.29728068
As you can see it return a column named variable
with the columnnames (A,B etc) and a column called value
which returns the data.
CodePudding user response:
Lets say you have df
:
set.seed(123)
df <- data.frame(VARIABLE1 = runif(1),
VARIABLE2 = runif(1),
VARIABLE3 = runif(1),
VARIABLE4 = runif(1),
VARIABLE5 = runif(1))
df
VARIABLE1 VARIABLE2 VARIABLE3 VARIABLE4 VARIABLE5
1 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
Then you can use tidyr
to do:
library(tidyr)
pivot_longer(df, everything())
# A tibble: 5 × 2
name value
<chr> <dbl>
1 VARIABLE1 0.288
2 VARIABLE2 0.788
3 VARIABLE3 0.409
4 VARIABLE4 0.883
5 VARIABLE5 0.940