Home > Net >  Parse strings of numbers and domain/subdomain strings, group, then sum numbers in each group
Parse strings of numbers and domain/subdomain strings, group, then sum numbers in each group

Time:11-26

In PHP, I have an array that shows how many times user clicked on each individual domain like this:

$counts = [ 
     "900,google.com",
     "60,mail.yahoo.com",
     "10,mobile.sports.yahoo.com",
     "40,sports.yahoo.com",
     "300,yahoo.com",
     "10,stackoverflow.com",
     "20,overflow.com",
     "5,com.com",
     "2,en.wikipedia.org",
     "1,m.wikipedia.org",
     "1,mobile.sports",
     "1,google.co.uk"
];

How can i use this input as a parameter to a function and returns a data structure containing the number of clicks that were recorded on each domain AND each subdomain under it. For example, a click on "mail.yahoo.com" counts toward the totals for "mail.yahoo.com", "yahoo.com", and "com". (Subdomains are added to the left of their parent domain. So "mail" and "mail.yahoo" are not valid domains. Note that "mobile.sports" appears as a separate domain near the bottom of the input.)

Sample output (in any order/format):

calculateClicksByDomain($counts) =>
com:                     1345
google.com:              900
stackoverflow.com:       10
overflow.com:            20
yahoo.com:               410
mail.yahoo.com:          60
mobile.sports.yahoo.com: 10
sports.yahoo.com:        50
com.com:                 5
org:                     3
wikipedia.org:           3
en.wikipedia.org:        2
m.wikipedia.org:         1
mobile.sports:           1
sports:                  1
uk:                      1
co.uk:                   1
google.co.uk:            1

The first step I am stuck at is how can to get subdomains from for example

"mobile.sports.yahoo.com" 

such that result is

[com, yahoo.com, sports.yahoo.com, mobile.sports.yahoo.com] 

CodePudding user response:

This code would work:

$counts = [ 
     "900,google.com",
     "60,mail.yahoo.com",
     "10,mobile.sports.yahoo.com",
     "40,sports.yahoo.com",
     "300,yahoo.com",
     "10,stackoverflow.com",
     "20,overflow.com",
     "5,com.com",
     "2,en.wikipedia.org",
     "1,m.wikipedia.org",
     "1,mobile.sports",
     "1,google.co.uk"
];

function calculateClicksByDomain($dataLines)
{
    $output = [];
    foreach ($dataLines as $dataLine) {
        [$count, $domain] = explode(',', $dataLine);
        $nameParts = [];
        foreach (array_reverse(explode('.', $domain)) as $namePart) {
            array_unshift($nameParts, $namePart);
            $domain = implode('.', $nameParts);
            $output[$domain] = ($output[$domain] ?? 0)   $count;
        }
    }   
    return $output;
}

print_r(calculateClicksByDomain($counts));

See: https://3v4l.org/o5VgJ#v8.0.26

This function walks over each line of the data and explodes it into a count and a domain. After that it explodes the domain by the dots, reverses it, and walks over those name parts. In that loop it reconstructs the various subdomains and counts them into the output.

  • Related