Python Pandas Create Columns Using Row and Add Values When Criteria is Met-CodePudding

I'm creating a program, and a part of it needs to convert rows into columns.

Here is what my CSV file looks like:

ID   | Items  | Status
1      Item1    Submitted
1
1      Item2    Pending
2      Item1    Submitted
3      Item1    Submitted

I want to achieve this using pandas. True if submitted and false if pending:

ID | Item1 | Item2
1    true    false
2    true    false
3    true    false

I tried using .dropna().unique() since I have rows with null values and I don't want them to become a column.

CodePudding user response：

Create helper column new, remove rows for non matched dictionary values (Submitted,Pending) and last pivoting by DataFrame.pivot:

df = (df.assign(new=df['Status'].map({'Submitted':True, 'Pending':False}))
        .dropna(subset=['new'])
        .pivot('ID','Items','new'))
print (df)
Items Item1  Item2
ID                
1      True  False
2      True    NaN
3      True    NaN

CodePudding user response：

Assuming your original csv is like this:

id,items,status
1,itemA,submitted
1,,
1,itemB,pending
2,itemA,submitted
3,itemA,submitted

And also assuming you want the resulting/output csv to look like this (you can easily read the output using pandas without further processing):

id,itemA,itemB
1,true,false
2,true,false
3,true,false

Then, the code you want is this:

import io
from collections import OrderedDict

data_array = []
data_array_new = []
unique_column = []
unique_row = OrderedDict()

file_object = io.open('dataset.txt', 'r')
header_line = file_object.readline().strip()
while True:
    line = file_object.readline().strip()
    if line != '':
        data_array.append(line)
    else:
        break
file_object.close()

for x in data_array:
    bad_line = False
    for y in x.split(","):
        if len(y) == 0:
            bad_line = True
    if not bad_line:
        data_array_new.append(x)
        unique_column.append(x.split(",")[1])
        if x.split(",")[2] == "submitted":
            evaluation = "true"
        else:
            evaluation = "false"
        try:
            unique_row[x.split(",")[0]][x.split(",")[1]] = evaluation
        except:
            unique_row[x.split(",")[0]] = OrderedDict()
            unique_row[x.split(",")[0]][x.split(",")[1]] = evaluation

unique_column = sorted(list(set(unique_column)))

for key, value in unique_row.items():
    for x in unique_column:
        try:
            unique_row[key][x]
        except:
            unique_row[key][x] = "false"

output_string = header_line.split(",")[0] "," ",".join(unique_column)   "\n"

for key, value in unique_row.items():
    output_string = output_string   key
    for x in unique_column:
        output_string = output_string   ","   value[x]
    output_string = output_string   "\n"

f = open("new_dataset.txt", "w")
f.write(output_string)
f.close()