Home > Mobile >  Turn Koeppen Climate Legend into meaningful csv with regex
Turn Koeppen Climate Legend into meaningful csv with regex

Time:12-07

I have this table:

    1:  Af   Tropical, rainforest                  [0 0 255]
    2:  Am   Tropical, monsoon                     [0 120 255]
    3:  Aw   Tropical, savannah                    [70 170 250]
    4:  BWh  Arid, desert, hot                     [255 0 0]
    5:  BWk  Arid, desert, cold                    [255 150 150]
    6:  BSh  Arid, steppe, hot                     [245 165 0]
    7:  BSk  Arid, steppe, cold                    [255 220 100]
    8:  Csa  Temperate, dry summer, hot summer     [255 255 0]
    9:  Csb  Temperate, dry summer, warm summer    [200 200 0]
    10: Csc  Temperate, dry summer, cold summer    [150 150 0]
    11: Cwa  Temperate, dry winter, hot summer     [150 255 150]
    12: Cwb  Temperate, dry winter, warm summer    [100 200 100]
    13: Cwc  Temperate, dry winter, cold summer    [50 150 50]
    14: Cfa  Temperate, no dry season, hot summer  [200 255 80]
    15: Cfb  Temperate, no dry season, warm summer [100 255 80]
    16: Cfc  Temperate, no dry season, cold summer [50 200 0]
    17: Dsa  Cold, dry summer, hot summer          [255 0 255]
    18: Dsb  Cold, dry summer, warm summer         [200 0 200]
    19: Dsc  Cold, dry summer, cold summer         [150 50 150]
    20: Dsd  Cold, dry summer, very cold winter    [150 100 150]
    21: Dwa  Cold, dry winter, hot summer          [170 175 255]
    22: Dwb  Cold, dry winter, warm summer         [90 120 220]
    23: Dwc  Cold, dry winter, cold summer         [75 80 180]
    24: Dwd  Cold, dry winter, very cold winter    [50 0 135]
    25: Dfa  Cold, no dry season, hot summer       [0 255 255]
    26: Dfb  Cold, no dry season, warm summer      [55 200 255]
    27: Dfc  Cold, no dry season, cold summer      [0 125 125]
    28: Dfd  Cold, no dry season, very cold winter [0 70 95]
    29: ET   Polar, tundra                         [178 178 178]
    30: EF   Polar, frost                          [102 102 102]

First: It is really hard to get this into a csv... I would like to have the code (first column) and the long description (e.g. Tropical, rainforest for the first row). So I thought I would handle this with a regex. But apparently I am hitting my understanding of how regexes work. I tried doing it in R, but I'd be super grateful for any help.

I tried something like this:

str_match(a, "\\d{1,2}:\\s[a-zA-Z]{2,3}.*([a-zA-Z,]).*\\[") but it fails...

CodePudding user response:

Prepared an example, right? screenshot Since the data contains commas, made tab delimiters

const regex = /(\d ):  \w   ([^\[] ).*/gm;
const str = `    
    1:  Af   Tropical, rainforest                  [0 0 255]
    2:  Am   Tropical, monsoon                     [0 120 255]
    3:  Aw   Tropical, savannah                    [70 170 250]
    8:  Csa  Temperate, dry summer, hot summer     [255 255 0]
    9:  Csb  Temperate, dry summer, warm summer    [200 200 0]
    10: Csc  Temperate, dry summer, cold summer    [150 150 0]
    11: Cwa  Temperate, dry winter, hot summer     [150 255 150]`;
const subst = `$1\t$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

CodePudding user response:

You may use either

str_match(a, "(\\d{1,2}):\\s*(.*?)\\s*\\[(.*)\\]")
str_match(a, "(\\d{1,2}):\\s*(\\w )\\s*(.*?)\\s*\\[(.*)\\]")

See the regex demo #1 and regex demo #2.

Details:

  • (\d{1,2}) - Group 1: one or two digits
  • :\s* - : and zero or more whitespaces
  • (\w ) - Group 2: one or more letters, digits or _
  • \s* - zero or more whitespaces
  • (.*?) - Group 3: any zero or more chars other than line break chars, as few as possible
  • \s* - zero or more whitespaces
  • \[ - a [ char
  • (.*) - Group 4: any zero or more chars other than line break chars, as many as possible
  • \] - a ] char.

CodePudding user response:

Use this regex :

([0-9] ):(?:\s )([a-zA-Z] )(?:\s )(.*?)\s*\[(.*)\]

Demo

[0-9]  : to get number part.

(?:\s )([a-zA-Z] )(?:\s )(.*?)\s* : match description part

\[(.*)\] : match between brackets part
  • Related