How robust is online sourcing of js scripts, and best practices-CodePudding

I come from an R background and I am starting to learn some javascript for data visualization purposes (think leaflet, d3, chart,...).

I trying to wrap my head around the fact that many tutorials and templates suggest loading packages, CSS, or even data directly from online sources. For example, https://leafletjs.com/examples/quick-start/ recommends:

Before writing any code for the map, you need to do the following preparation steps on your page:

Include Leaflet CSS file in the head section of your document:

 <link rel="stylesheet" href="https://unpkg.com/[email protected]/dist/leaflet.css"
   integrity="sha512-xodZBNTC5n17Xt2atTPuE1HxjVMSvLVW9ocqUKLsCC5CXdbqCmblAshOMAS6/keqq/sMZMZ19scR4PsZChSR7A=="
   crossorigin=""/>

Include Leaflet JavaScript file after Leaflet’s CSS:

 <!-- Make sure you put this AFTER Leaflet's CSS -->
 <script src="https://unpkg.com/[email protected]/dist/leaflet.js"
   integrity="sha512-XQoYMqMTK8LvdxXYG3nZ448hOEQiglfqkJs1NOQV44cWnUrBc8PkAOcXy20w0vlaXaVUearIOBhiXZ5V3ynxwA=="
   crossorigin=""></script>

It's not that you can't do things like that in R as well. But still, coming from "an R culture", I am used to the feeling that I have a local "hard copy" of every package and piece of data my code relies on. Then, when I ship my code (e.g., when I publish a Shiny app), an instantaneous of all required dependencies ship with it so it works as a standalone. I understand the downside in terms of storage space on the server, but my sense is this might be faster and less more reliable.

What I'd like to know is whether my understanding of online sourcing and its tradeoffs in javascript is correct, and if so, what the best practices are to address potential shortcomings. In particular:

Do I understand correctly that dependencies like https://unpkg.com/[email protected]/dist/leaflet.js or https://unpkg.com/[email protected]/dist/leaflet.css are reloaded every time I refresh the page?
The page is therefore dependent on those links not breaking, right? Or are there some inner mechanics I am not aware of that avoid this kind of wasteful reloading and risky dependency?
If there are not, do people just live with the risk of links breaking down? Or is it best practice to keep a local copy of scripts like https://unpkg.com/[email protected]/dist/leaflet.js and source them locally instead? Or even, is there yet another best practice, like using a "safer provider" as a source for dependencies (do I understand correctly that this is the role of services like https://www.jsdelivr.com/?)?

CodePudding user response：

Do I understand correctly that dependencies like https://unpkg.com/[email protected]/dist/leaflet.js or https://unpkg.com/[email protected]/dist/leaflet.css are reloaded every time I refresh the page?

No. HTTP clients perform caching.

The page is therefore dependent on those links not breaking, right?

Yes. (Where "breaking" includes "being blocked by a firewall" (a particular problem for users in China who often find that they can access a website but the JS doesn't work because it is hosted somewhere blocked by the Great Firewall) and "the CDN server being taken over by someone malicious")

do people just live with the risk of links breaking down?

Yes. Risk is relative though. CDNs are generally selected because the provider is trusted.

The potential benefits include faster access to the JS through the CDN making use of edge servers and the possibility that (for popular libraries, at least) a client will have already cached the data because another site used the same library.

You're also using the CDN host's bandwidth to serve the JS instead of your own, which can be a cost saving.

CodePudding user response：

Do I understand correctly that dependencies like https://unpkg.com/[email protected]/dist/leaflet.js or https://unpkg.com/[email protected]/dist/leaflet.css are reloaded every time I refresh the page?

Yes and no. The important thing here is caching. Browsers will cache resources that have been loaded. Therefore, if a user goes on the page and hits refresh over and over, they would only download these resources once and each reload will use the cached version. Thus no they are not reloaded every time.

However, any time the user clears the cache or a new user comes in without the resource in their cache, then the file will be downloaded. Cache expiration for browsers is not entirely predictable as it is controlled by the users to a large extent. However, chances are that if a user visited today and then again next week using the same browser, they would still have the item in their cache. However, if their cache is flushed, or they use a different browser, or a different machine, or it is an entirely different user who visits, then yes - they would load the resource again.

The page is therefore dependent on those links not breaking, right? Or are there some inner mechanics I am not aware of that avoid this kind of wasteful reloading and risky dependency?

The inner mechanics are caching from above. However, if a resource link is taken down for whatever reason, then the page cannot use it. This could happen because:

the source of the link is no longer working.
the source is blocked by a firewall or other mechanism thus the user does not have access to it.
the user employs some blocking mechanism like an adblocker or a script blocker extension which means they have opted into preventing requests for certain resources.

In all these cases the result is similar: the user might have access to the full functionality of the page if they have a cached copy of the resources it needs. Otherwise, they cannot use them. Script files will not be executed, stylesheets will not be applied, images will not show, etc.

The way to fix each of these would be different:

For non-working links you need to find a new source or maybe even host it yourself.
For blocked resources, you might need to find a hosting site that is acceptable for the blocking mechanism. Self-hosting might be an option.
If the user is blocking the scripts you they would most likely need to unblock them. Although a combination of the above two approaches might also work - hosting on a domain not known for ads might avoid being blocked and self-hosting might also work. At least in the case of uMatrix - the addon by default blocks all scripts external to the page (with very few exceptions). If the scripts come from the same domain, then uMatrix would allow them by default.

If there are not, do people just live with the risk of links breaking down? Or is it best practice to keep a local copy of scripts like https://unpkg.com/[email protected]/dist/leaflet.js and source them locally instead?

There are essentially two approaches here. Each with their strengths and downsides. A quick breakdown is:

You can accept externally hosted resources.
- Advantage: There are several big content delivery networks (CDN) such as unpkg or jsdelivr. They are widely used for their reliability. In addition, some libraries might offer their own CDN - jQuery does that, for example. Getting the resource from a CDN saves you badwidth and space but can also improve the load speed. CDNs might have better speed than your hosting does. Furthermore, for widely used libraries, the CDN copy is likely to be cached on the user site from visiting another site that used the same CDN copy as you.
- Disadvantage: it does leave you dependent on resources you do not control. Big CDNs are reliable but you still cannot have any direct control if anything happened. And if you use a smaller CDN (e.g., library specific or otherwise) then you do not have data about its reliability. If an external link dies, your website will not work until it is fixed and that might take days - you have to find out about it, figure out what was changed, hopefully find a replacement link, update the site. If you cannot find a drop-in replacement, you might need to make more changes.
You can host the resources yourself.
- Advantage: You are in complete control of when and how are things stored. You can even process the resources in some way to help with your application. Scripts can be minified, images can be scaled and resized into several different sizes to to optimise the display in different places (e.g., an icon, a small image, and a full size image).
- Disadvantage: more space taken, more bandwidth taken. Also, now you have to manage all of these resources and make sure they exist, they are available etc.

You can of course also use a mixed approach. Host some resources, use others from an external place. Depends on what you want to do with your application and what level of control you want to retain versus how much extra effort and costs you want.

Saying all that, for a lot of small projects it does not matter that much which path is chosen. If you only use a handful of libraries it matters little whether you use them from a CDN or host them yourself. As long a reliable CDN provider is chosen, the chance of an outage is acceptably minimal. If you host the resources, chances are they would take up few hundred kilobytes (if that).

If your project grows and the list of dependencies you have starts to get bigger and bigger, it might be time to take stock and decide how where you host them and how you consume them. There is no single answer to this question, it will likely depend on what you already have. Perhaps your hosting has very little space. Or you pay per megabyte downloaded. In that case, external hosting would make more sense. Or perhaps you have a robust storage option for yourself and you are confident you can ensure the availability of your application, in which case self-hosting might be preferable