[pkg-discuss] The future of the pkg mirror
- From: Erik Trauschke <
- Subject: [pkg-discuss] The future of the pkg mirror
- Date: Tue, 30 Apr 2013 09:07:09 -0700
lately we had some talks with other groups and the issue of properly
supporting a content delivery network popped back up.
At the moment it is possible to do that but it's kind of a hack. One
could configure the CDN as a pkg mirror in the publisher setup and if
(and only if) the CDN is faster than the main origin the file content is
retrieved from it.
Another thing is that the pkg mirror is a rather confusing term since
one would expect a mirror to be a complete alternate repo which it is
not. I think that is the main reason why Shawn was contemplating getting
rid of it altogether.
Pkg should support a proper way of offloading pkg content to a CDN
though, since this will make pkg downloads faster for a lot of people
which do not live in the US. It also allows transporting bulk data
unencrypted (and therefore faster) even though the according metadata
will still be transmitted encrypted and secure.
So I'm proposing the following:
- provide a mechanism to let a pkg client download from an alternate source
- the mechanism shall be invisible to the user, so no additional setup
of source URIs in set-publisher shall be necessary. Also, the output of
'pkg publisher' shall not be cluttered by it.
- The only data on the alternate source shall be bulk data, I think at
the moment that would only be content under file/1. Metadata shall only
come from an origin.
- The pkg client shall try to retrieve bulk data from the alternate
source only. However, if the client encounters too many errors it will
fall back to download bulk data from the origin.
- The pkg client shall support a list of alternate sources and decide
which one to download from by determining which is fastest.
I think it shouldn't take that much effort to add this functionality
into the current pkg client because most of it is already there in one
form or other. What I'm thinking is putting a new property in the
publisher object. In particular that would be a list called
content_source (or alternate_file_source, alt_file_src, whatever, let
the bikeshedding begin).
The pkg client will get this list from the repo and use the entries to
determine the fastest one to download it's data from. However, contrary
to the current behavior it will not consider the origin for downloading
If the client encounters a lot of errors from all alternate sources it
will at some point ignore them and fall back to only retrieve content
from the origin.
This way we'd have a supported, simple and transparent to the user
mechanism to deploy the split model and offload some of the traffic to
CDNs like Akamai.
Let me know what you guys think.