I have this table:
1: Af Tropical, rainforest [0 0 255]
2: Am Tropical, monsoon [0 120 255]
3: Aw Tropical, savannah [70 170 250]
4: BWh Arid, desert, hot [255 0 0]
5: BWk Arid, desert, cold [255 150 150]
6: BSh Arid, steppe, hot [245 165 0]
7: BSk Arid, steppe, cold [255 220 100]
8: Csa Temperate, dry summer, hot summer [255 255 0]
9: Csb Temperate, dry summer, warm summer [200 200 0]
10: Csc Temperate, dry summer, cold summer [150 150 0]
11: Cwa Temperate, dry winter, hot summer [150 255 150]
12: Cwb Temperate, dry winter, warm summer [100 200 100]
13: Cwc Temperate, dry winter, cold summer [50 150 50]
14: Cfa Temperate, no dry season, hot summer [200 255 80]
15: Cfb Temperate, no dry season, warm summer [100 255 80]
16: Cfc Temperate, no dry season, cold summer [50 200 0]
17: Dsa Cold, dry summer, hot summer [255 0 255]
18: Dsb Cold, dry summer, warm summer [200 0 200]
19: Dsc Cold, dry summer, cold summer [150 50 150]
20: Dsd Cold, dry summer, very cold winter [150 100 150]
21: Dwa Cold, dry winter, hot summer [170 175 255]
22: Dwb Cold, dry winter, warm summer [90 120 220]
23: Dwc Cold, dry winter, cold summer [75 80 180]
24: Dwd Cold, dry winter, very cold winter [50 0 135]
25: Dfa Cold, no dry season, hot summer [0 255 255]
26: Dfb Cold, no dry season, warm summer [55 200 255]
27: Dfc Cold, no dry season, cold summer [0 125 125]
28: Dfd Cold, no dry season, very cold winter [0 70 95]
29: ET Polar, tundra [178 178 178]
30: EF Polar, frost [102 102 102]
First: It is really hard to get this into a csv...
I would like to have the code (first column) and the long description (e.g. Tropical, rainforest
for the first row). So I thought I would handle this with a regex. But apparently I am hitting my understanding of how regexes work. I tried doing it in R
, but I'd be super grateful for any help.
I tried something like this:
str_match(a, "\\d{1,2}:\\s[a-zA-Z]{2,3}.*([a-zA-Z,]).*\\[")
but it fails...
CodePudding user response:
Prepared an example, right? screenshot Since the data contains commas, made tab delimiters
const regex = /(\d ): \w ([^\[] ).*/gm;
const str = `
1: Af Tropical, rainforest [0 0 255]
2: Am Tropical, monsoon [0 120 255]
3: Aw Tropical, savannah [70 170 250]
8: Csa Temperate, dry summer, hot summer [255 255 0]
9: Csb Temperate, dry summer, warm summer [200 200 0]
10: Csc Temperate, dry summer, cold summer [150 150 0]
11: Cwa Temperate, dry winter, hot summer [150 255 150]`;
const subst = `$1\t$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
CodePudding user response:
You may use either
str_match(a, "(\\d{1,2}):\\s*(.*?)\\s*\\[(.*)\\]")
str_match(a, "(\\d{1,2}):\\s*(\\w )\\s*(.*?)\\s*\\[(.*)\\]")
See the regex demo #1 and regex demo #2.
Details:
(\d{1,2})
- Group 1: one or two digits:\s*
-:
and zero or more whitespaces(\w )
- Group 2: one or more letters, digits or_
\s*
- zero or more whitespaces(.*?)
- Group 3: any zero or more chars other than line break chars, as few as possible\s*
- zero or more whitespaces\[
- a[
char(.*)
- Group 4: any zero or more chars other than line break chars, as many as possible\]
- a]
char.
CodePudding user response:
Use this regex :
([0-9] ):(?:\s )([a-zA-Z] )(?:\s )(.*?)\s*\[(.*)\]
[0-9] : to get number part.
(?:\s )([a-zA-Z] )(?:\s )(.*?)\s* : match description part
\[(.*)\] : match between brackets part