Home > Net >  Convert data into comma separated values
Convert data into comma separated values

Time:01-29

How do i convert data into comma separated values, i want to convert like

I have this data in excel on single cell "ABCD x3 ABC, BAC x 3" Want to convert to ABCD,ABCD,ABCD,ABC,BAC,BAC,BAC

can't find an easy way to do that.

I am trying to solve it in python so i can get a structured data

CodePudding user response:

there might be multiple ways around this issue and I think the best fix will be to restructure how the data is Formatted into the cell.

here are a couple of ways you can work with the data. it wont directly solve your issue but I hope it will help you think about how you approach it (not being rude I don't actually have a good way to handle your example <3 )

split() will split your string as character 'x' and return a list of substrings you can iterate over.

data = 'ABCD ABCD ABCD ABC BAC BAC BAC'
splitdata = data.split(' ')
print(splitdata)
#prints - ['ABCD', 'ABCD', 'ABCD', 'ABC', 'BAC', 'BAC', 'BAC']

you could also try and match strings from the data

import re

data2 = "ABCD x3 ABC BAC x3"
result = []

for match in re.finditer(r'(\w ) x(\d )', data2):
    substring, count = match.groups()
    result.extend([substring] * int(count))
    print(result)

use re.finditer to go through the string and match the data with the following format = '(\w ) x(\d )' each match then gets added to the list.

'\w' is used to match a character. '\d' is used to match a digit. ' ' is the quantifier, means one or more.

so we are matching = '(\w ) x(\d )',

which broken down means we are matching (\w ) one or more characters followed by a 'space' then 'x' followed by (\d ) one or more digits

so because your cell data is essentially a string followed by a multiplier then a string followed by another string and then another multiplier, the data just feels too random for a general solution and i think this requires a direct solution that can only work if you know exactly what data is already in the cell. that's why i think the best way to fix it is to rework the data in the cell first. im in no way an expert and this answer is to help you think of ways around the problem and to add to the discussion :) ,if someone wants to correct me and offer a better solution to this I would love to know myself.

CodePudding user response:

Hi Zeeshan to try and sort the string into usable data while also multiplying certain parts of the string is kind of tricky for me. the best solution I can think of is kind of gross but it seems to work. hopefully my comments aren't too confusing <3

import re

data = "ABCD x3 AB BAC x2"

#this will split the string into a list that you can iterate through.
Datalist = re.findall(r'(\w )', data)

#create a new list for the final result
newlist = []

 
for object in Datalist:
    #for each object in the Datalist list
    #if the object starts with 'x'
    if re.search("x.*", object):
        #convert the multiplier to type(string) and then split the x from the multiplier number string
        xvalue = str(object).split('x')
        #grab and remove the last item added to the newlist because it hasnt been multiplied.
        lastitem = newlist.pop()
        #now we can add the last item back in by as many times as the x value
        newlist.extend([lastitem] * int(xvalue[1]))
    else:
        #if the object doesnt start with an x then we can just add it to the list.
         newlist.extend([object])
#print result
print(newlist)

#re.search() - looks for a match in a string
#.split() - splits a string into multiple substrings
#.pop() - removes the last item from a list and returns that item. 
#.extend() - adds an item to the end of a list

keep in mind that to find the multiplier its looking for x followed by a number (x1). if there is a space for example = (x 1) then it will match x but it wont return a value because there is a space.

  • Related