Use pandas to create dict out of CSV columns-CodePudding

I have a csv file(with single line) like this

drop1,drop2,key1,value1,key2,value2,key3,value3...keyN,valueN

The output I need is

{
'key1':'value1',
'key2':'value2',
..
'keyN':'valueN',
}

I intend to use dataframes to do. I tried using reshape and pivot, but being new to pandas, I am not able to figure it out.

Any pointer will be great help .

CodePudding user response：

You can try reshape the values after first two columns to shape (-1, 2) where first column is key and second column is value

df = pd.read_csv('your.csv', header=None)
out = (pd.DataFrame(df.iloc[:, 2:].values.reshape(-1, 2))
       .set_index(0)[1].to_dict())

print(df)

       0      1     2       3     4       5     6       7     8       9
0  drop1  drop2  key1  value1  key2  value2  key3  value3  keyN  valueN

print(out)

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}

CodePudding user response：

IIUC:

df = pd.read_csv('your.csv', header=None)

lst = list(df)

# remove the strings with 'drop' in it
lst = [s for s in lst if 'drop' not in s]

# create key/value list based on lst
keys = [s for s in lst if 'key' in s]
value = [s for s in lst if 'val' in s]

# create dictionary using zip
d = dict(zip(keys, value))

Output:

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}

CodePudding user response：

Try this method -

Read the csv without headers and pick only the first row
Filter this series based on "drop" value occurring (or any other condition)
Reshape it to key, value shaped array and convert to dict

import pandas as pd

s = pd.read_csv("test.csv",header=None).iloc[0] #read csv without headers and pickup first row as a series

drop_idx = ~s.str.match("drop") #find values that contain "drop" or any other condition
arr = s[drop_idx].to_numpy().reshape(-1,2) #reshape the series into keys, values
output = dict(arr) #convert to dict
print(output)

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}

CodePudding user response：

If key and values are not ordered:

cols = df.iloc[0].to_list()

keys = sorted([val[-1] for val in cols if val.startswith('key')])
values = sorted([val[-1] for val in cols if val.startswith('val')])

my_dict = {f'key{key}': f'val{val}' for key, val in zip(keys, values)}
print(my_dict)

CodePudding user response：

Without numpy and comprehensions:

s = pd.read_csv(r'c:\test\test111111.txt', header=None).iloc[0, 2:]  # get the Series without first two elements
print(s[1::2].set_axis(s[::2]).to_dict())  # get odd elements (values) and make index from even elements (keys)

Prints:

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}