I have a csv file(with single line) like this
drop1,drop2,key1,value1,key2,value2,key3,value3...keyN,valueN
The output I need is
{
'key1':'value1',
'key2':'value2',
..
'keyN':'valueN',
}
I intend to use dataframes to do. I tried using reshape and pivot, but being new to pandas, I am not able to figure it out.
Any pointer will be great help .
CodePudding user response:
You can try reshape
the values after first two columns to shape (-1, 2)
where first column is key
and second column is value
df = pd.read_csv('your.csv', header=None)
out = (pd.DataFrame(df.iloc[:, 2:].values.reshape(-1, 2))
.set_index(0)[1].to_dict())
print(df)
0 1 2 3 4 5 6 7 8 9
0 drop1 drop2 key1 value1 key2 value2 key3 value3 keyN valueN
print(out)
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}
CodePudding user response:
IIUC:
df = pd.read_csv('your.csv', header=None)
lst = list(df)
# remove the strings with 'drop' in it
lst = [s for s in lst if 'drop' not in s]
# create key/value list based on lst
keys = [s for s in lst if 'key' in s]
value = [s for s in lst if 'val' in s]
# create dictionary using zip
d = dict(zip(keys, value))
Output:
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}
CodePudding user response:
Try this method -
- Read the csv without headers and pick only the first row
- Filter this series based on "drop" value occurring (or any other condition)
- Reshape it to key, value shaped array and convert to
dict
import pandas as pd
s = pd.read_csv("test.csv",header=None).iloc[0] #read csv without headers and pickup first row as a series
drop_idx = ~s.str.match("drop") #find values that contain "drop" or any other condition
arr = s[drop_idx].to_numpy().reshape(-1,2) #reshape the series into keys, values
output = dict(arr) #convert to dict
print(output)
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'keyN': 'valueN'}
CodePudding user response:
If key and values are not ordered:
cols = df.iloc[0].to_list()
keys = sorted([val[-1] for val in cols if val.startswith('key')])
values = sorted([val[-1] for val in cols if val.startswith('val')])
my_dict = {f'key{key}': f'val{val}' for key, val in zip(keys, values)}
print(my_dict)
CodePudding user response:
Without numpy and comprehensions:
s = pd.read_csv(r'c:\test\test111111.txt', header=None).iloc[0, 2:] # get the Series without first two elements
print(s[1::2].set_axis(s[::2]).to_dict()) # get odd elements (values) and make index from even elements (keys)
Prints:
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}