Home > OS >  Hive UDF - exetremely slow when parsing IP addresses
Hive UDF - exetremely slow when parsing IP addresses

Time:09-17

I have a column which comprises ip addresses. Now I need to parse them to contries/cities: select IPUtils('199.999.999.999') and it returns ['Aisa', 'Hongkong', 'xxx', 'Hongkong']

I write a hive udf to do this but it runs exetremely slow, as shown below:

INFO : 2021-09-08 18:51:10,817 Stage-2 map = 100%, reduce = 30%, Cumulative CPU 9074.06 sec

map = 100% while progress of reduce gains 1 percent every 15 minutes.

The UDF reads file from the project's resource folder, so mayby it repeatly read the file again and again? The udf is shown as below, any help is appreciated:

public class IPUtil extends UDF {

    public List<String>  evaluate(String  ip){
        try{
            ClassLoader classloader = Thread.currentThread().getContextClassLoader();

            // I put the mmdb file in resource folder of the java project
            InputStream is = classloader.getResourceAsStream("GeoLite2-City.mmdb");
            DatabaseReader reader = new DatabaseReader.Builder(is).build();

            InetAddress ipAddress = InetAddress.getByName(ip);
            CityResponse response = reader.city(ipAddress);
            Country country = response.getCountry();
            Subdivision subdivision = response.getMostSpecificSubdivision();
            City city = response.getCity();
            Continent continent = response.getContinent();

            List<String> list = new LinkedList<String>();

            list.add(continent.getNames().get("zh-CN"));
            list.add(country.getNames().get("zh-CN"));
            list.add(subdivision.getNames().get("zh-CN"));
            list.add(city.getNames().get("zh-CN"));

            return list;

        } catch (UnknownHostException e) {
            e.printStackTrace();
            return null;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        } catch (GeoIp2Exception e) {
            e.printStackTrace();
            return null;
        }
    }

    @Test
    public void test()throws Exception{
        System.out.println(evaluate("175.45.20.138"));
    }
}

CodePudding user response:

Move this

InputStream is = classloader.getResourceAsStream("GeoLite2-City.mmdb");
DatabaseReader reader = new DatabaseReader.Builder(is).build();

to the class initialization.

  • Related