I'm creating a program, and a part of it needs to convert rows into columns.
Here is what my CSV file looks like:
ID | Items | Status
1 Item1 Submitted
1
1 Item2 Pending
2 Item1 Submitted
3 Item1 Submitted
I want to achieve this using pandas. True if submitted and false if pending:
ID | Item1 | Item2
1 true false
2 true false
3 true false
I tried using .dropna().unique()
since I have rows with null values and I don't want them to become a column.
CodePudding user response:
Create helper column new
, remove rows for non matched dictionary values (Submitted
,Pending
) and last pivoting by DataFrame.pivot
:
df = (df.assign(new=df['Status'].map({'Submitted':True, 'Pending':False}))
.dropna(subset=['new'])
.pivot('ID','Items','new'))
print (df)
Items Item1 Item2
ID
1 True False
2 True NaN
3 True NaN
CodePudding user response:
Assuming your original csv is like this:
id,items,status
1,itemA,submitted
1,,
1,itemB,pending
2,itemA,submitted
3,itemA,submitted
And also assuming you want the resulting/output csv to look like this (you can easily read the output using pandas without further processing):
id,itemA,itemB
1,true,false
2,true,false
3,true,false
Then, the code you want is this:
import io
from collections import OrderedDict
data_array = []
data_array_new = []
unique_column = []
unique_row = OrderedDict()
file_object = io.open('dataset.txt', 'r')
header_line = file_object.readline().strip()
while True:
line = file_object.readline().strip()
if line != '':
data_array.append(line)
else:
break
file_object.close()
for x in data_array:
bad_line = False
for y in x.split(","):
if len(y) == 0:
bad_line = True
if not bad_line:
data_array_new.append(x)
unique_column.append(x.split(",")[1])
if x.split(",")[2] == "submitted":
evaluation = "true"
else:
evaluation = "false"
try:
unique_row[x.split(",")[0]][x.split(",")[1]] = evaluation
except:
unique_row[x.split(",")[0]] = OrderedDict()
unique_row[x.split(",")[0]][x.split(",")[1]] = evaluation
unique_column = sorted(list(set(unique_column)))
for key, value in unique_row.items():
for x in unique_column:
try:
unique_row[key][x]
except:
unique_row[key][x] = "false"
output_string = header_line.split(",")[0] "," ",".join(unique_column) "\n"
for key, value in unique_row.items():
output_string = output_string key
for x in unique_column:
output_string = output_string "," value[x]
output_string = output_string "\n"
f = open("new_dataset.txt", "w")
f.write(output_string)
f.close()