I have a quick question. I want to import a .cvs file, and then i want to export it back. But in some cells there are more informations separated by \n. I manage to separete it but, i want to know is there a more faster way to do it? And after i finish with the data, i want to export it back, with the \n. I know is a bit confusing, i will put the example:
From This .csv read
Column1 | Column2 | Column3 |
---|---|---|
asd\nfgd | 11\n22 | Sys1 |
mada\nchaca\nhera | 32\n120\n35 | Sys2 |
yay | 344 | Sys3 |
to this
Column1 | Column2 | Column3 |
---|---|---|
asd | 11 | Sys1 |
fgd | 22 | Sys1 |
mada | 32 | Sys2 |
chaca | 120 | Sys2 |
hera | 35 | Sys2 |
yay | 344 | Sys3 |
and vice-versa.
This is my code ( i did it only one way):
import pandas as pd
tabel=pd.read_csv('Test1.csv')
print(tabel)
Column1=[]
Column2=[]
Column3=[]
for i in range(0,len(tabel.index)):
sys_n=[tabel.iloc[i, 2]]
if "\n" in tabel.iloc[i,0] and "\n" in tabel.iloc[i,1]:
Column1.append(tabel.iloc[i, 0].splitlines())
Column2.append(tabel.iloc[i, 1].splitlines())
nr_sys=len(tabel.iloc[i, 0].splitlines())
Column3.extend([sys_n]*nr_sys)
else:
Column1.append(tabel.iloc[i, 0].splitlines())
Column2.append(tabel.iloc[i, 1].splitlines())
Column3.extend([sys_n])
print(Column1)
result=sum(Column1,[])
result1=sum(Column2,[])
result3=sum(Column3,[])
tabel2=pd.DataFrame(list(zip(result,result1,result3)), columns=[['Column1','Column2','Column3']])
print(tabel2)
tabel2.to_csv('Test1_1.csv',index=False)
Note: If Column1 has 2 lines, Column2 will have it too.
CodePudding user response:
Given tabel
Column1 Column2 Column3
0 asd\nfgd 11\n22 Sys1
1 mada\nchaca\nhera 32\n120\n35 Sys2
2 yay 344 Sys3
you could do
tabel2 = pd.DataFrame(
{"Column1": c_1, "Column2": c_2, "Column3": c_3}
for col_1, col_2, c_3 in tabel.itertuples(index=False)
for c_1, c_2 in zip(col_1.split("\n"), col_2.split("\n"))
)
or
tabel2 = pd.DataFrame(
(
[c_1, c_2, c_3]
for col_1, col_2, c_3 in tabel.itertuples(index=False)
for c_1, c_2 in zip(col_1.split("\n"), col_2.split("\n"))
),
columns=tabel.columns
)
Result:
Column1 Column2 Column3
0 asd 11 Sys1
1 fgd 22 Sys1
2 mada 32 Sys2
3 chaca 120 Sys2
4 hera 35 Sys2
5 yay 344 Sys3
Given tabel2
you could do
col_1, col_2, col_3 = tabel2.columns
tabel = tabel2.groupby(col_3, as_index=False).agg(
{col_1: "\n".join, col_2: "\n".join, col_3: "first"}
)
Result:
Column1 Column2 Column3
0 asd\nfgd 11\n22 Sys1
1 mada\nchaca\nhera 32\n120\n35 Sys2
2 yay 344 Sys3
CodePudding user response:
Assuming every cell of each row has the same number of '\n'
characters, you can split each cell on '\n'
into a list using Series.str.split
. Then, explode each list into a different row using Series.explode
. You can apply that logic to every column using Series.apply
.
Expanding
>>> df
Column1 Column2 Column3
0 asd\nfgd 11\n22 Sys1
1 mada\nchaca\nhera 32\n120\n35 Sys2
2 yay 344 Sys3
>>> long = df.apply(lambda col: col.str.split(r'\\n').explode())
>>> long
Column1 Column2 Column3
0 asd 11 Sys1
0 fgd 22 Sys1
1 mada 32 Sys2
1 chaca 120 Sys2
1 hera 35 Sys2
2 yay 344 Sys3
Compressing
To invert the process you can group the rows by their index (level=0
) and concatenate string groups using str.join
>>> short = long.groupby(level=0).agg(r'\n'.join)
>>> short
Column1 Column2 Column3
0 asd\nfgd 11\n22 Sys1\nSys1
1 mada\nchaca\nhera 32\n120\n35 Sys2\nSys2\nSys2
2 yay 344 Sys3