This is my data frame.
Date | Country | Value |
---|---|---|
1/4/1971 | Sweden | 5.1643 |
1/5/1971 | Sweden | 5.1628 |
1/6/1971 | Sweden | 5.1614 |
1/7/1971 | Sweden | 5.1649 |
1/8/1971 | Sweden | 5.1631 |
1/4/1971 | Canada | 1.0109 |
1/5/1971 | Canada | 1.0102 |
1/6/1971 | Canada | 1.0106 |
1/7/1971 | Canada | 1.0148 |
1/8/1971 | Canada | 1.0154 |
1/4/1971 | India | 8.02 |
1/5/1971 | India | 8.00 |
1/6/1971 | India | 8.01 |
1/7/1971 | India | 8.00 |
1/8/1971 | India | 8.03 |
I want above data frame like bellow using python and panda.
Date | Sweden | Canada | India |
---|---|---|---|
1/4/1971 | 5.1643 | 1.0109 | 8.02 |
1/5/1971 | 5.1628 | 1.0102 | 8 |
1/6/1971 | 5.1614 | 1.0106 | 8.01 |
1/7/1971 | 5.1649 | 1.0148 | 8 |
1/8/1971 | 5.1631 | 1.0154 | 8.03 |
Please help me. Thank You.
CodePudding user response:
You could do this with the pivot
method of the dataframe.
Code
The following code assumes the original data is in a file named test.csv
.
import pandas as pd
df = pd.read_csv('test.csv')
print(df)
df = df.pivot(index='Date', columns='Country', values = 'Value').reset_index()
print(df)
Before
Date | Country | Value |
---|---|---|
1/4/1971 | Sweden | 5.1643 |
1/5/1971 | Sweden | 5.1628 |
1/6/1971 | Sweden | 5.1614 |
1/7/1971 | Sweden | 5.1649 |
1/8/1971 | Sweden | 5.1631 |
1/4/1971 | Canada | 1.0109 |
1/5/1971 | Canada | 1.0102 |
1/6/1971 | Canada | 1.0106 |
1/7/1971 | Canada | 1.0148 |
1/8/1971 | Canada | 1.0154 |
1/4/1971 | India | 8.02 |
1/5/1971 | India | 8 |
1/6/1971 | India | 8.01 |
1/7/1971 | India | 8 |
1/8/1971 | India | 8.03 |
After
Country | Date | Canada | India | Sweden |
---|---|---|---|---|
0 | 1/4/1971 | 1.0109 | 8.02 | 5.1643 |
1 | 1/5/1971 | 1.0102 | 8 | 5.1628 |
2 | 1/6/1971 | 1.0106 | 8.01 | 5.1614 |
3 | 1/7/1971 | 1.0148 | 8 | 5.1649 |
4 | 1/8/1971 | 1.0154 | 8.03 | 5.1631 |
CodePudding user response:
Here we create your dataframe for test..
import pandas as pd
arr = [['1/4/1971', 'Sweden', '5.1643'],
['1/5/1971', 'Sweden', '5.1628'],
['1/6/1971', 'Sweden', '5.1614'],
['1/7/1971', 'Sweden', '5.1649'],
['1/8/1971', 'Sweden', '5.1631'],
['1/4/1971', 'Canada', '1.0109'],
['1/5/1971', 'Canada', '1.0102'],
['1/6/1971', 'Canada', '1.0106'],
['1/7/1971', 'Canada', '1.0148'],
['1/8/1971', 'Canada', '1.0154'],
['1/4/1971', 'India', '8.02'],
['1/5/1971', 'India', '8.00'],
['1/6/1971', 'India', '8.01'],
['1/7/1971', 'India', '8.00'],
['1/8/1971', 'India', '8.03']]
df = pd.DataFrame(arr,columns=['Date','Country','Value'])
print('old form')
print(df)
The output should be like:
old form
Date Country Value
0 1/4/1971 Sweden 5.1643
1 1/5/1971 Sweden 5.1628
2 1/6/1971 Sweden 5.1614
3 1/7/1971 Sweden 5.1649
4 1/8/1971 Sweden 5.1631
5 1/4/1971 Canada 1.0109
6 1/5/1971 Canada 1.0102
7 1/6/1971 Canada 1.0106
8 1/7/1971 Canada 1.0148
9 1/8/1971 Canada 1.0154
10 1/4/1971 India 8.02
11 1/5/1971 India 8.00
12 1/6/1971 India 8.01
13 1/7/1971 India 8.00
14 1/8/1971 India 8.03
Let's do our magic:
note: this code not optimized but works well
table = {}
for row in df.values:
date = row[0]
country = row[1]
value = row[2]
if date not in table:table[date] = {country:value}
else:table[date][country] = value
arr = []
for date in table.keys():
row = table[date]
row = [date,row['Sweden'],row['Canada'],row['India']]
arr.append(row)
df2 = pd.DataFrame(arr,columns=['Date','Sweden','Canada','India'])
print('new form')
print(df2)
The final output should be
new form
Date Sweden Canada India
0 1/4/1971 5.1643 1.0109 8.02
1 1/5/1971 5.1628 1.0102 8.00
2 1/6/1971 5.1614 1.0106 8.01
3 1/7/1971 5.1649 1.0148 8.00
4 1/8/1971 5.1631 1.0154 8.03