I often work like this:
for skra in `ls *txt` ; do paste foo.csv <(cut -f 5 $skra) > foo.csv; done
for looping through a directory by using 'ls'
Now I don't understand why this command does not add column to foo.csv in every loop
What is happening under the hood? Seems like foo.csv is not saved in every iteration
The output I get is field 5 from the last file. Not even the original foo.csv as I get if I only paste foo.csv bar.txt
EDIT: All files are tab delimited
foo.csv is just one column in the beginning
example.txt as seen in vim with set list
:
(101,6352)(11174,51391)(10000,60000)^INC_044048.1^I35000^I6253^I0.038250$ (668,7819)(23384,69939)(20000,70000)^INC_044048.1^I45000^I7153^I0.034164$ (2279,8111)(32691,73588)(30000,80000)^INC_044048.1^I55000^I5834^I0.031908$
Here is a python script that does what I want:
import pandas
rammi=[]
with open('window.list') as f:
for line in f:
nafn=line.strip()
df=pandas.read_csv(nafn, header=None, names=[nafn], sep='\t', usecols=[4])
rammi.append(df)
frame = pandas.concat(rammi, axis=1)
frame.to_csv('rammi.allra', sep='\t', encoding='utf-8')
Paste column 4 from all files to one (initially I wanted to retain one original column but it was not necessary). The question was about bash not wanting to update stdin in the for loop.
CodePudding user response:
As already noted in the comments, opening foo.csv
for output will truncate it in most shells. (Even if that was not the case, opening the file and running cut
and paste
repeatedly looks quite inefficient.)
If you don’t mind keeping all the data in memory at one point in time, a simple AWK or Bash script can do this type of processing without any further processes such as cut
or paste
.
awk -F'\t' ' { lines[FNR] = lines[FNR] "\t" $5 }
END { for (l in lines) print substr(lines[l], 2) }' \
*.txt > foo.csv
(The output should not be called .csv
, but I’m sticking with the naming from the question nonetheless.)
Actually, one doesn’t really need awk
for this, Bash will do:
#!/bin/bash
lines=()
for file in *.txt; do
declare -i i=0
while IFS=$'\t' read -ra line; do
lines[i ] =$'\t'"${line[4]}"
done < "$file"
done
printf '%s\n' "${lines[@]/#?}" > foo.csv
(As a side note, "${lines[@]:1}"
would remove the first line, not the first (\t
) character of each line. (This particular expansion syntax works differently for strings (scalars) and arrays in Bash.) Hence "${lines[@]/#?}"
(another way to express the removal of the first character), which does get applied to each array element.)