Home > Software engineering >  Pivot wider two columns with different values
Pivot wider two columns with different values

Time:11-28

I have two data frames

The first one: AEZ_2

    head(AEZ_2)
     gid                           AEZ AEZ_area_sum
1 142976   Tropics, lowland; semi-arid     18.20585
2 142977   Tropics, lowland; semi-arid    924.40126
3 142978   Tropics, lowland; semi-arid    509.15215
4 142978   Tropics, lowland; sub-humid     11.47290
5 143696   Tropics, lowland; semi-arid    858.59022
6 143697 Dominantly hydromorphic soils    589.91021

dput(AEZ_2[1:6, c(1: 3)])
structure(list(gid = c(142976, 142977, 142978, 142978, 143696, 
143697), AEZ = c("Tropics, lowland; semi-arid", "Tropics, lowland; semi-arid", 
"Tropics, lowland; semi-arid", "Tropics, lowland; sub-humid", 
"Tropics, lowland; semi-arid", "Dominantly hydromorphic soils"
), AEZ_area_sum = c(18.2058489094, 924.401258895, 509.152149209, 
11.4728955973, 858.590216109, 589.9102080814)), row.names = c(NA, 
6L), class = "data.frame")

The second one: Farming system

      gid                           Farming_system  FS_area_sum
1   142976 5. Cereal-root crop mixed farming system 1.820585e 01
2   142977 5. Cereal-root crop mixed farming system 9.244013e 02
3   142978 5. Cereal-root crop mixed farming system 5.206250e 02
4   143696           2. Agropastoral farming system 6.979757e 02
5   143696 5. Cereal-root crop mixed farming system 1.606145e 02
6   143697           2. Agropastoral farming system 2.107575e 03
dput(FS_2[1:5, c(1:3)])
structure(list(gid = c(142976, 142977, 142978, 143696, 143696
), Farming_system = structure(c(9L, 9L, 9L, 6L, 9L), .Label = c("1. Maize mixed farming system", 
"10. Forest-based farming system", "11. Large-scale irrigated farming system", 
"12. Perennial mixed farming system", "13. Arid pastoral oasis farming system", 
"2. Agropastoral farming system", "3. Highland perennial farming system", 
"4. Root and tuber crop farming system", "5. Cereal-root crop mixed farming system", 
"6. Highland mixed farming system", "7. Humid lowland tree crop farming system", 
"8. Pastoral farming system", "9. Fish-based farming system"), class = "factor"), 
    FS_area_sum = c(18.205849004, 924.40125911, 520.625044495, 
    697.975740616, 160.614476324)), row.names = c(NA, 5L), class = "data.frame")

I merged the two data frames: Final.27

   head(Final.27)
    gid                     Farming_system FS_area_sum                                       AEZ AEZ_area_sum
1 62356                               <NA>          NA Land with severe soil/terrain limitations     334.9770
2 79599       9. Fish-based farming system 198.0554185   Sub-tropics, moderately cool; sub-humid      74.4029
3 79599       9. Fish-based farming system 198.0554185 Land with severe soil/terrain limitations     123.7758
4 79599 12. Perennial mixed farming system   0.1306899   Sub-tropics, moderately cool; sub-humid      74.4029
5 79599 12. Perennial mixed farming system   0.1306899 Land with severe soil/terrain limitations     123.7758
6 79600       9. Fish-based farming system 603.2818466 Land with severe soil/terrain limitations     141.4297
dput(Final.27[1:5, c(1:5)])
structure(list(gid = c(62356, 79599, 79599, 79599, 79599), Farming_system = structure(c(NA, 
13L, 13L, 4L, 4L), .Label = c("1. Maize mixed farming system", 
"10. Forest-based farming system", "11. Large-scale irrigated farming system", 
"12. Perennial mixed farming system", "13. Arid pastoral oasis farming system", 
"2. Agropastoral farming system", "3. Highland perennial farming system", 
"4. Root and tuber crop farming system", "5. Cereal-root crop mixed farming system", 
"6. Highland mixed farming system", "7. Humid lowland tree crop farming system", 
"8. Pastoral farming system", "9. Fish-based farming system"), class = "factor"), 
    FS_area_sum = c(NA, 198.0554184899, 198.0554184899, 0.130689938047, 
    0.130689938047), AEZ = c("Land with severe soil/terrain limitations", 
    "Sub-tropics, moderately cool; sub-humid", "Land with severe soil/terrain limitations", 
    "Sub-tropics, moderately cool; sub-humid", "Land with severe soil/terrain limitations"
    ), AEZ_area_sum = c(334.9769749362, 74.4028953581, 123.77575431, 
    74.4028953581, 123.77575431)), row.names = c(NA, 5L), class = "data.frame")

They have in common the 'gid' column (=cell identifier of 0.5x0.5). So a gid can have multiple observations (=polygons). That is why we can see that some gid are duplicated. In addition, sometimes the gid is fulfilled by an AEZ and not the farming system. Which is fine for me to have a NA observations (3rd column).

What I would like is a final table where the AEZ column is pivoted and filled with AEZ_area_sum and the FS column is pivoted and filled with FS_area_sum. In order to have many observations by gid and not a gid duplicated according to the different observations.

I tried to do this

First.flanked <- Final.27 %>% 
  pivot_wider(names_from = AEZ, values_from = AEZ_area_sum)

and then do it a second time for the farming system

Final.flanked <- First.flanked %>% 
pivot_wider(names_from = Farming_system, value_from = FS_area_sum)

but I get:

Erreur : Failed to create output due to bad names.

  • Choose another strategy with names_repair Run rlang::last_error() to see where the error occurred. De plus : Message d'avis : Values are not uniquely identified; output will contain list-cols.
  • Use values_fn = list to suppress this warning.
  • Use values_fn = length to identify where the duplicates arise
  • Use values_fn = {summary_fun} to summarise duplicates

I know it's not working because of the duplicated gid. But how can I get the results I want ? to say => One gid for multiple observations within the same row ?

CodePudding user response:

Final.flanked <- Final.27 %>% 
  pivot_wider(id_cols = gid, names_from = c(Farming_system, AEZ), values_from = c(AEZ_area_sum, FS_area_sum))

View(Final.flanked)

enter image description here I'm not sure you want all these columns, but it's easy to eliminate the unneeded ones from here.

  • Related