How to calculate conditional means from .txt files-CodePudding

I'm fairly new to programming and am looking for some guidance. Any help is appreciated.

Here's what I'm trying to do: I have a large number of .txt files from a cognitive experiment (Flanker task, if curious) that I need to compute means for based on condition. The files have no headers and look like below:

XXXXX 1 1 675
XXYXX 0 1 844
YYYYY 1 1 599
YYXYY 0 1 902

I would like to compute means for miliseconds (rightmost column; c4) based on the experimental condition (0 or 1; c2). I would also need the file name of each .txt file (my participant ID) included in the output.

I'm most familiar with R but really just for data analysis. I also have a little experience with Python and Matlab if those (or something else) better suit my needs. Again, a point in any direction would be greatly appreciated.

Thanks

CodePudding user response：

The Tidyverse collection of packages specially the dplyr and readr can easy do this task for you on a grammar likely SQL.

Something like

#loading packages
library(tidyverse)

#importing data
df <- read_delim("file.txt", delim="|", col_names=c("col1", "col2", "col3", "col4"))

#dealing with data
#only mean for col2 == 1
df %>%
filter(col2 == 1) %>%
summarize(mean_exp = mean(col4))

#mean considering grouping by col2
df %>%
group_by(col2) %>%
summarize(mean_exp = mean(col4))

I may suggest you search for cheatsheets available on the links above. They are very easy to understand and reproduce the code.

CodePudding user response：

Here is how you could do it in R:

# mimick your text files

cat("XXXXX 1 1 675",file="XXXXX.txt",sep="\n")
cat("XXYXX 0 1 844",file="XXYXX.txt",sep="\n")
cat("YYYYY 1 1 599",file="YYYYY.txt",sep="\n")
cat("YYXYY 0 1 902",file="YYXYY.txt",sep="\n")


# create a list
my_list_txt <- list.files(pattern=".txt")

files_df <- lapply(my_list_txt, function(x) {read.table(file = x, header = F)})

# create a dataframe
df <- do.call("rbind", lapply(files_df, as.data.frame))

# do the group calculation
library(dplyr)
df %>% 
  group_by(V2) %>% 
  summarise(mean = mean(V4))

     V2  mean
  <int> <dbl>
1     0   873
2     1   637