I have a sample dataframe as follows:
Back Unamed: 2 Unamed: 3 Unamed: 4 Unamed: 5 Unamed: 6
Q5
Text q5?
Ireland Poland Spain France Chile
Option 1 9% 56% 78% 23% 22%
Option 2 19% 44% 65% 33% 44%
Option 3 78% 88% 66% 54% 97%
Option 4 43% 32% 67% 23% 21%
Q6
Text q6?
Ireland Poland Spain France Chile
option 1 39% 16% 38% 13% 22%
Option 2 38% 82% 64% 54% 97%
Option 3 53% 12% 97% 13% 91%
Data in text:
{'back': {6: 'Q5',
7: 'How long has the business been operating?',
9: nan,
11: 'Less than 1 year',
12: '1 - 3 years',
13: '4 - 10 years',
14: 'More than 10 years',
19: 'Q6',
20: 'How many employees are regularly involved in running your business? These employees can be part-time, full-time, volunteers or informal/unregistered workers.',
22: nan,
24: '1 employee',
25: '2 to 4 employees',
26: '5 to 9 employees',
31: 'Q7',
32: 'Which sector does your company belong to?',
34: nan,
36: 'Catering - Hospitality, Events, etc',
37: 'Retail - Online, Physical',
38: 'Creative - Arts, Restoration, Media, etc',
39: 'Production - Food, Drinks, Carpentry, Clothing etc',
40: 'Personal - Healthcare, Beauty, Fitness, etc',
41: 'Household Services - Delivery, Cleaning, Gardening, etc',
42: 'Construction - Building, Electrics, Plumbing, etc',
43: 'Motor Trade - Servicing, Tyres, Sales, etc',
44: 'Education/ Social - Learning, Associations, etc',
45: 'Professional - Technology, Legal, Financial, etc',
46: 'Other Sector (please specify)',
51: 'Q8',
52: 'What challenges do you currently experience in your business?',
55: nan,
57: 'Challenges to attract new customers',
58: 'Lack of funding',
59: 'Standing out against competitors',
60: 'Managing stress and burnout',
61: 'Retaining customers',
62: 'Lack of support',
63: 'Difficulty managing business finances',
64: 'Lack of knowledge on growing the business',
65: 'Lack of premises',
66: 'No access to digital skills training',
67: 'Lack of cyber security',
68: 'No access to business skills training',
69: 'Not sure how to access training for my team/company',
70: 'Other (please specify)',
71: "I don't experience any challenges at the moment*",
72: '*exclusive',
77: 'Q9',
78: 'Which devices are used by you and any of your employees?',
81: nan,
83: 'Smartphone',
84: 'Laptop',
85: 'Desktop PC',
86: 'iPad or tablet',
87: 'Other (please specify)',
92: 'Q10',
93: 'What software/applications are used by your business?',
96: nan,
98: 'Marketing platforms such as email and social media channels',
99: 'Business productivity applications such as Microsoft Office or Google Workspace',
100: 'Financial tools to manage your cash flow, budgets and payments',
101: 'Online security software to protect against online threats',
102: 'Marketing tools to schedule and automate your posts',
103: 'Website building tools such as Wix, Webflow or WordPress',
104: 'Inventory management tools to keep a track of your stock',
105: 'HR tech for recruiting employees and managing HR systems',
106: 'Other (please specify)',
107: 'None of the above.*',
108: '*exclusive',
113: 'Q11',
114: 'What type of internet package does your business have?',
116: nan,
118: 'Fiber Optic',
119: 'Wireless broadband',
120: 'Satellite broadband',
121: 'Other (please specify)',
126: 'Q12',
127: 'Do you need to be able to view content offline? This could be due to frequent WiFi or data issues.',
129: nan,
131: 'Yes',
132: 'No',
133: "I don't know"},
'Unnamed: 2': {6: nan,
7: nan,
9: 'Ireland',
11: '6%',
12: '30%',
13: '31%',
14: '33%',
19: nan,
20: nan,
22: 'Ireland',
24: '37%',
25: '36%',
26: '27%',
31: nan,
32: nan,
34: 'Ireland',
36: '7%',
37: '15%',
38: '14%',
39: '5%',
40: '13%',
41: '3%',
42: '11%',
43: '1%',
44: '9%',
45: '13%',
46: '9%',
51: nan,
52: nan,
55: 'Ireland',
57: '43%',
58: '27%',
59: '29%',
60: '33%',
61: '13%',
62: '15%',
63: '23%',
64: '11%',
65: '8%',
66: '9%',
67: '5%',
68: '11%',
69: '6%',
70: '4%',
71: '11%',
72: nan,
77: nan,
78: nan,
81: 'Ireland',
83: '83%',
84: '69%',
85: '39%',
86: '38%',
87: '1%',
92: nan,
93: nan,
96: 'Ireland',
98: '43%',
99: '48%',
100: '35%',
101: '37%',
102: '29%',
103: '35%',
104: '17%',
105: '9%',
106: '1%',
107: '10%',
108: nan,
113: nan,
114: nan,
116: 'Ireland',
118: '27%',
119: '63%',
120: '5%',
121: '5%',
126: nan,
127: nan,
129: 'Ireland',
131: '51%',
132: '43%',
133: '6%'},
'Unnamed: 3': {6: nan,
7: nan,
9: 'Poland',
11: '12%',
12: '29%',
13: '35%',
14: '25%',
19: nan,
20: nan,
22: 'Poland',
24: '58%',
25: '31%',
26: '12%',
31: nan,
32: nan,
34: 'Poland',
36: '5%',
37: '21%',
38: '8%',
39: '5%',
40: '10%',
41: '7%',
42: '10%',
43: '2%',
44: '6%',
45: '21%',
46: '8%',
51: nan,
52: nan,
55: 'Poland',
57: '42%',
58: '30%',
59: '26%',
60: '20%',
61: '26%',
62: '20%',
63: '18%',
64: '10%',
65: '6%',
66: '5%',
67: '7%',
68: '6%',
69: '5%',
70: '2%',
71: '17%',
72: nan,
77: nan,
78: nan,
81: 'Poland',
83: '90%',
84: '78%',
85: '48%',
86: '23%',
87: '-',
92: nan,
93: nan,
96: 'Poland',
98: '49%',
99: '66%',
100: '31%',
101: '33%',
102: '20%',
103: '16%',
104: '17%',
105: '6%',
106: '1%',
107: '11%',
108: nan,
113: nan,
114: nan,
116: 'Poland',
118: '49%',
119: '41%',
120: '7%',
121: '3%',
126: nan,
127: nan,
129: 'Poland',
131: '34%',
132: '53%',
133: '13%'},
'Unnamed: 4': {6: nan,
7: nan,
9: 'Spain',
11: '9%',
12: '25%',
13: '30%',
14: '37%',
19: nan,
20: nan,
22: 'Spain',
24: '42%',
25: '40%',
26: '19%',
31: nan,
32: nan,
34: 'Spain',
36: '16%',
37: '15%',
38: '13%',
39: '4%',
40: '10%',
41: '3%',
42: '7%',
43: '2%',
44: '7%',
45: '18%',
46: '8%',
51: nan,
52: nan,
55: 'Spain',
57: '37%',
58: '26%',
59: '31%',
60: '26%',
61: '20%',
62: '21%',
63: '11%',
64: '12%',
65: '12%',
66: '8%',
67: '10%',
68: '6%',
69: '6%',
70: '4%',
71: '11%',
72: nan,
77: nan,
78: nan,
81: 'Spain',
83: '83%',
84: '60%',
85: '61%',
86: '36%',
87: '1%',
92: nan,
93: nan,
96: 'Spain',
98: '54%',
99: '47%',
100: '37%',
101: '39%',
102: '26%',
103: '22%',
104: '30%',
105: '10%',
106: '2%',
107: '10%',
108: nan,
113: nan,
114: nan,
116: 'Spain',
118: '76%',
119: '18%',
120: '4%',
121: '2%',
126: nan,
127: nan,
129: 'Spain',
131: '42%',
132: '47%',
133: '12%'},
'Unnamed: 5': {6: nan,
7: nan,
9: 'France',
11: '16%',
12: '36%',
13: '29%',
14: '21%',
19: nan,
20: nan,
22: 'France',
24: '64%',
25: '27%',
26: '10%',
31: nan,
32: nan,
34: 'France',
36: '6%',
37: '18%',
38: '12%',
39: '5%',
40: '22%',
41: '4%',
42: '6%',
43: '2%',
44: '3%',
45: '13%',
46: '11%',
51: nan,
52: nan,
55: 'France',
57: '44%',
58: '25%',
59: '26%',
60: '23%',
61: '21%',
62: '14%',
63: '11%',
64: '18%',
65: '11%',
66: '5%',
67: '5%',
68: '8%',
69: '4%',
70: '5%',
71: '13%',
72: nan,
77: nan,
78: nan,
81: 'France',
83: '85%',
84: '72%',
85: '43%',
86: '32%',
87: '-',
92: nan,
93: nan,
96: 'France',
98: '37%',
99: '36%',
100: '35%',
101: '31%',
102: '28%',
103: '36%',
104: '19%',
105: '8%',
106: '2%',
107: '14%',
108: nan,
113: nan,
114: nan,
116: 'France',
118: '61%',
119: '33%',
120: '6%',
121: '2%',
126: nan,
127: nan,
129: 'France',
131: '42%',
132: '50%',
133: '8%'},
'Unnamed: 6': {6: nan,
7: nan,
9: 'Chile',
11: '19%',
12: '50%',
13: '27%',
14: '5%',
19: nan,
20: nan,
22: 'Chile',
24: '34%',
25: '52%',
26: '15%',
31: nan,
32: nan,
34: 'Chile',
36: '9%',
37: '26%',
38: '10%',
39: '11%',
40: '9%',
41: '2%',
42: '11%',
43: '4%',
44: '5%',
45: '5%',
46: '10%',
51: nan,
52: nan,
55: 'Chile',
57: '38%',
58: '36%',
59: '23%',
60: '25%',
61: '15%',
62: '23%',
63: '18%',
64: '14%',
65: '15%',
66: '13%',
67: '10%',
68: '6%',
69: '11%',
70: '3%',
71: '6%',
72: nan,
77: nan,
78: nan,
81: 'Chile',
83: '85%',
84: '65%',
85: '35%',
86: '21%',
87: '-',
92: nan,
93: nan,
96: 'Chile',
98: '56%',
99: '39%',
100: '22%',
101: '15%',
102: '30%',
103: '18%',
104: '23%',
105: '12%',
106: '2%',
107: '10%',
108: nan,
113: nan,
114: nan,
116: 'Chile',
118: '47%',
119: '39%',
120: '9%',
121: '6%',
126: nan,
127: nan,
129: 'Chile',
131: '53%',
132: '36%',
133: '12%'}}
that I want to convert to row wise in the format
Question No Question Answers Country Response
Q5 Text q5? Option 1 Ireland. 9%
Q5 Text q5? Option 1 Poland 56%
Q5 Text q5? Option 1 Spain 78%
Q5 Text q5? Option 1 France 23%
Q5 Text q5? Option 1 Chile 22%
Q5 Text q5? Option 2 Ireland 19%
Q5 Text q5? Option 2 Poland 44%
Q5 Text q5? Option 2 Spain 65%
Q5 Text q5? Option 2 France 33%
Q5 Text q5? Option 2 Chile 44%
Q5 Text q5? Option 3 Ireland 78%
Q5 Text q5? Option 3 Poland 88%
Q5 Text q5? Option 3 Spain 66%
Q5 Text q5? Option 3 France 54%
Q5 Text q5? Option 3 Chile 97%
Q5 Text q5? Option 4 Ireland 43%
Q5 Text q5? Option 4 Poland 32%
Q5 Text q5? Option 4 Spain 67%
Q5 Text q5? Option 4 France 23%
Q5 Text q5? Option 4 Chile 21%
Q6 Text q6? Option 1 Ireland 39%
Q6 Text q6? Option 1 Poland 16%
Q6 Text q6? Option 1 Spain 38%
Q6 Text q6? Option 1 France 13%
Q6 Text q6? Option 1 Chile 22%
Q6 Text q6? Option 2 Ireland 38%
Q6 Text q6? Option 2 Poland 82%
Q6 Text q6? Option 2 Spain 64%
Q6 Text q6? Option 2 France 54%
Q6 Text q6? Option 2 Chile 97%
Q6 Text q6? Option 3 Ireland 53%
Q6 Text q6? Option 3 Poland 12%
Q6 Text q6? Option 3 Spain 97%
Q6 Text q6? Option 3 France 13%
Q6 Text q6? Option 3 Chile 91%
I have tried un pivoting the Dataframe but since the country names are not in columns neither are they in multi indexed form I did not get the required format.
df_unpivot = pd.melt(surveys, id_vars='back', value_vars=['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6'],
var_name='Country', value_name='Response')
I have created a new Dataframe to copy all the information in that
surveys_cleaned['Question_Number'] = surveys['back'].str.contains("^Q", na=False)
but not sure how to Unpivot the rest of the table. Any help or lead would be appreciated.
UPDATE: I have renamed the columns and applied pivot again so my data looks like this now
The code:
surveys = surveys.rename(columns={"Unnamed: 2": "Ireland", "Unnamed: 3": "Poland","Unnamed: 4": "Spain","Unnamed: 5": "France", "Unnamed: 6": "Chile"})
surveys = surveys[surveys['back'].notna()]
CodePudding user response:
The main idea is that you should even turn a pivot table into a pseudo-plane
data
is your dict.
data = {'back': {6: 'Q5',
7: 'How long has the business been operating?',...
Read the data and rename columns (as in your example)
df = pd.DataFrame.from_dict(data).rename(columns={"Unnamed: 2": "Ireland", "Unnamed: 3": "Poland","Unnamed: 4": "Spain","Unnamed: 5": "France", "Unnamed: 6": "Chile"})[df['back'].notna()]
Create Q
field for "Question No" from back
field
df.loc[df['back'].str.contains("^Q"), 'Q'] = df.loc[df['back'].str.contains("^Q"), 'back']
df['Q'] = df['Q'].fillna(method='ffill')
Drop rows with "Question No" only
df = df[~df['back'].str.contains("^Q")]
Create field quest
for "Question"" (as Question No field)
df.loc[df['Ireland'].isnull(), 'quest'] = df.loc[df['Ireland'].isnull(), 'back']
df['quest'] = df['quest'].fillna(method='ffill')
df = df[~df['Ireland'].isnull()]
Unpivot the dataframe and rename columns
unpivot_df = df.melt(id_vars=['Q', 'quest', 'back'])
unpivot_df.columns = ['Question No', 'Question', 'Answer', 'Country', 'Response']
Full code at https://gitlab.com/-/snippets/2376741
With best regards
CodePudding user response:
# Fix column names.
df.columns = ['back', 'Ireland', 'Poland', 'Spain', 'France', 'Chile']
# Remove the asterisk lines.
df = df[~df.back.fillna('').str.startswith('*')]
# Fill question number across, drop remaining null lines.
df = df.ffill(axis=1).dropna()
# Make a new column for Question number and fill it down.
q_rows = df.back.str.match('Q\d*')
df['Question'] = df.back.where(q_rows, np.nan).ffill()
# Remove the question rows.
df = df[~q_rows]
dfs = []
# For each question group...
for question, group in df.groupby('Question'):
# Reset the index of the group.
group = group.reset_index(drop=True)
# Make the columns
group.columns = pd.MultiIndex.from_arrays([group.columns, group.loc[0, :]])
# Grab the question.
text = group.iat[0,0]
# Drop the question Row.
group = group.drop(0)
# Melt the group, keeping the question_no and text.
group = group.melt([('Question', question), ('back', text)])
# Update your column names.
group.columns = ['Question No', 'Answers', 'Country', 'Question', 'Response']
# Add the group to the list.
dfs.append(group)
# Put all the groups back together.
cleaned = pd.concat(dfs, ignore_index=True)
# Re-order the columns how you want.
cleaned = cleaned[['Question No', 'Question', 'Answers', 'Country', 'Response']]
print(cleaned.sample(50))
Output of 50 random rows:
Question No Question Answers Country Response
176 Q8 What challenges do you currently experience in... Lack of funding Ireland 27%
21 Q10 What software/applications are used by your bu... Business productivity applications such as Mic... Spain 47%
10 Q10 What software/applications are used by your bu... Marketing platforms such as email and social m... Poland 49%
124 Q7 Which sector does your company belong to? Personal - Healthcare, Beauty, Fitness, etc Ireland 13%
187 Q8 What challenges do you currently experience in... Not sure how to access training for my team/co... Ireland 6%
211 Q8 What challenges do you currently experience in... Difficulty managing business finances Spain 11%
29 Q10 What software/applications are used by your bu... None of the above.* Spain 10%
233 Q8 What challenges do you currently experience in... Other (please specify) France 5%
48 Q10 What software/applications are used by your bu... Other (please specify) Chile 2%
138 Q7 Which sector does your company belong to? Motor Trade - Servicing, Tyres, Sales, etc Poland 2%
166 Q7 Which sector does your company belong to? Creative - Arts, Restoration, Media, etc Chile 10%
114 Q6 How many employees are regularly involved in r... 1 employee France 64%
244 Q8 What challenges do you currently experience in... No access to digital skills training Chile 13%
59 Q11 What type of internet package does your busine... Wireless broadband Spain 18%
49 Q10 What software/applications are used by your bu... None of the above.* Chile 10%
221 Q8 What challenges do you currently experience in... Lack of funding France 25%
38 Q10 What software/applications are used by your bu... Other (please specify) France 2%
90 Q5 How long has the business been operating? 1 - 3 years Poland 29%
177 Q8 What challenges do you currently experience in... Standing out against competitors Ireland 29%
151 Q7 Which sector does your company belong to? Professional - Technology, Legal, Financial, etc Spain 18%
22 Q10 What software/applications are used by your bu... Financial tools to manage your cash flow, budg... Spain 37%
32 Q10 What software/applications are used by your bu... Financial tools to manage your cash flow, budg... France 35%
11 Q10 What software/applications are used by your bu... Business productivity applications such as Mic... Poland 66%
63 Q11 What type of internet package does your busine... Wireless broadband France 33%
85 Q5 How long has the business been operating? Less than 1 year Ireland 6%
163 Q7 Which sector does your company belong to? Other Sector (please specify) France 11%
174 Q7 Which sector does your company belong to? Other Sector (please specify) Chile 10%
123 Q7 Which sector does your company belong to? Production - Food, Drinks, Carpentry, Clothing... Ireland 5%
112 Q6 How many employees are regularly involved in r... 2 to 4 employees Spain 40%
141 Q7 Which sector does your company belong to? Other Sector (please specify) Poland 8%
213 Q8 What challenges do you currently experience in... Lack of premises Spain 12%
94 Q5 How long has the business been operating? 1 - 3 years Spain 25%
17 Q10 What software/applications are used by your bu... HR tech for recruiting employees and managing ... Poland 6%
1 Q10 What software/applications are used by your bu... Business productivity applications such as Mic... Ireland 48%
56 Q11 What type of internet package does your busine... Satellite broadband Poland 7%
54 Q11 What type of internet package does your busine... Fiber Optic Poland 49%
4 Q10 What software/applications are used by your bu... Marketing tools to schedule and automate your ... Ireland 29%
189 Q8 What challenges do you currently experience in... I don't experience any challenges at the moment* Ireland 11%
265 Q9 Which devices are used by you and any of your ... Smartphone France 85%
108 Q6 How many employees are regularly involved in r... 1 employee Poland 58%
91 Q5 How long has the business been operating? 4 - 10 years Poland 35%
162 Q7 Which sector does your company belong to? Professional - Technology, Legal, Financial, etc France 13%
78 Q12 Do you need to be able to view content offline... I don't know Spain 12%
5 Q10 What software/applications are used by your bu... Website building tools such as Wix, Webflow or... Ireland 35%
255 Q9 Which devices are used by you and any of your ... Smartphone Poland 90%
185 Q8 What challenges do you currently experience in... Lack of cyber security Ireland 5%
266 Q9 Which devices are used by you and any of your ... Laptop France 72%
127 Q7 Which sector does your company belong to? Motor Trade - Servicing, Tyres, Sales, etc Ireland 1%
98 Q5 How long has the business been operating? 1 - 3 years France 36%
263 Q9 Which devices are used by you and any of your ... iPad or tablet Spain 36%