I am working with R. I have 120 documents that look like this:
chair table hill block chain ball money house
2 4 5 6 7 2 4 5
1 3 6 1 8 3 9 1
2 1 1 6 1 8 2 3
6 4 5 4 2 5 8 4
5 5 5 5 3 2 6 7
I created a function:
myFunc <- function(x) c(mean = mean(x), n = length(x),
SD = sd(x))
Then, I used the sapply
function and converted the results to a data frame.
lol3 <- sapply(mydat, myFunc)
lol4 <- as.data.frame(lol3)
How can pass this through all of my 120 documents and obtain the same output?
CodePudding user response:
You can read your data into R (assuming .csv files) with something like this, which will put the 120 dataframes into a list, dfs
.
temp <- list.files(pattern="*.csv",
full.names=TRUE)
dfs <- lapply(temp, read.csv)
Then, you can use purrr
to apply your function to all columns in each dataframe.
library(tidyverse)
purrr::map(dfs, function(x) as.data.frame(map(x, myFunc)))
Or if you want to stick to the apply
family, then you can use a combination of sapply
and lapply
, which produces the same output.
lapply(dfs, FUN = function(x) as.data.frame(sapply(x, myFunc)))
Output
[[1]]
chair table hill
mean 5.707284 2.016096 4.632898
n 5.000000 5.000000 5.000000
SD 3.037238 2.609171 2.891649
[[2]]
chair table hill
mean 4.972276 3.522378 2.779039
n 5.000000 5.000000 5.000000
SD 2.309736 2.402731 1.805293
[[3]]
chair table hill
mean 4.614903 2.994203 3.573117
n 5.000000 5.000000 5.000000
SD 2.341367 2.250936 2.388503
[[4]]
chair table hill
mean 3.962970 5.326495 5.289796
n 5.000000 5.000000 5.000000
SD 3.454892 3.250463 2.261613
Data and function
dfs <-
list(
structure(
list(
chair = c(
8.5190371570643,
7.96348396944813,
0.909618176054209,
6.22358219046146,
4.92070004786365
),
table = c(
6.57108870637603,
1.24985219235532,
0.31808012444526,
1.61383197060786,
0.327624693978578
),
hill = c(
6.09723530360498,
4.0752498563379,
0.514291892526671,
8.34481998276897,
4.13289327942766
)
),
class = "data.frame",
row.names = c(NA,-5L)
),
structure(
list(
chair = c(
6.7650549269747,
3.03855412406847,
4.73554673418403,
7.83120877831243,
2.49101754790172
),
table = c(
0.390065581072122,
4.98121203482151,
2.45721989544109,
6.66204453259706,
3.12134596821852
),
hill = c(
1.91045304620638,
4.58099147421308,
0.0588874609675258,
3.53219888708554,
3.81266354466788
)
),
class = "data.frame",
row.names = c(NA,-5L)
),
structure(
list(
chair = c(
6.3361757278908,
6.55107694189064,
0.718950896523893,
4.66502936370671,
4.80328303738497
),
table = c(
3.64353043888696,
0.393067884491757,
5.97248072689399,
1.12406565248966,
3.83787051541731
),
hill = c(
2.51783188269474,
1.69069789093919,
2.80711475084536,
3.11010800697841,
7.73983212606981
)
),
class = "data.frame",
row.names = c(NA,-5L)
),
structure(
list(
chair = c(
1.01416687178425,
8.35876755136997,
6.14014947647229,
4.20261403801851,
0.0991506285499781
),
table = c(
5.58780367858708,
6.96770576946437,
8.87186487391591,
0.143813505303115,
5.06128515000455
),
hill = c(
5.23900085524656,
8.74273954122327,
5.80283471778966,
2.78146772808395,
3.8829374271445
)
),
class = "data.frame",
row.names = c(NA,-5L)
)
)
myFunc <- function (x)
c(mean = mean(x),
n = length(x),
SD = sd(x))
CodePudding user response:
You need to read-in the paths of your documents. I once wrote a medium article about path handling functions in R. There point 4 is about how to recursively list files in a folder.
# let's say your files have all the ending .txt
paths <- list.files("/to/dir_containing_all_the_files",
pattern="\\.txt",
full.names=TRUE, # give absolute paths
recursive=TRUE) # search also in subfolders for files
# you read the correct file-read functions:
# let's assume - like your input - they are TAB (?) separated
dfs <- lapply(paths, function(path) read.delim(path, header=TRUE, sep="\t"))
# in case of csv files:
dfs <- lapply(paths, function(path) read.csv(path, header=TRUE))
# you have to take a single file and adjust the read.delim function correctly
# test it for some files until it works!
# the dfs should contain each of the data frames which were read-in.
# you know how you proceeded for a single data frame -
# just write a function for a single data frame:
process_df <- function(df) sapply(df, myFunc)
# use this to s/lapply over the dfs:
process_dfs <- function(dfs) sapply(dfs, process_df)
# then you call:
result <- process_dfs(dfs)
# and see how it is structured:
str(result)
Or you go through all the procedure for one single path, and then loop over it:
process_path <- function(path) {
df <- read.delim(path, header=TRUE, sep="\t")
sapply(df, myFunc)
}
# finally, you can write:
process_paths <- function(paths) sapply(paths, process_path)
# and call:
process_paths(paths)