GNU Wget Metalink module Evaluation of the Metalink/XML and Metalink/HTTP implementations 1. Introduction *************** This document, and the results contained in it, is focused over the evaluation of the Metalink/XML and Metalink/HTTP implementations. The "Directory Options" mentioned here are used on the command line in conjunction with the option '--input-metalink=file' for Metalink/XML, and '--metalink-over-http' for Metalink/HTTP. $ wget --input-metalink= [directory options] $ wget --metalink-over-http [directory options] 2. Notes ******** Tests for metalink:file names beginning with '/', '~/', './', or '../' (e.g. "/path/file") shall be run manually due to security concerns. 3. Metalink files used as reference *********************************** 3.1 Test: metalink:file with "path/file" name format ==================================================== cat > test.meta4 << EOF 543 d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e 0f6ff5cdc15603f1b81227b5a296f001 http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig EOF 4. `wget --input-metalink=test.meta4` ************************************* 4.1 Implemented safety features =============================== Any metalink:file name containing an absolute, relative, or home path (see '2. Notes') parsed from Metalink/XML files is rejected. This is a libmetalink's design decision implemented in the function metalink_check_safe_path(). This feature shall not be modified. All the above conform to the RFC5854 standard. References: https://tools.ietf.org/html/rfc5854#section-4.1.2.1 https://tools.ietf.org/html/rfc5854#section-4.2.8.3 4.2 File download behaviour =========================== When a Metalink/XML file is parsed: 1. create the metalink:file "path/file" tree; 2. download the metalink:url file as "path/file"; 3. verify the "path/file" size, if declared; 4. verify the "path/file" checksum. All the above conform to the RFC5854 standard. References: https://tools.ietf.org/html/rfc5854 4.3 Questionable behaviours =========================== If more metalink:file elements are the same, wget downloads them all. 5. `wget --metalink-over-http` ****************************** 5.1 Implemented safety features =============================== The function url_file_name() is responsible of parsing the url's file name and mixing in the "Directory Options" wrote on the command line. The use of libmetalink's metalink_check_safe_path() shouldn't be necessary (see '4.1 Implemented safety features'). All the above comform to the usual Wget's download behaviour. References: wget(1) 5.2 File download behaviour =========================== When a Metalink/HTTP header is parsed: 1. extract metalink metadata from the header; 2. download the file from the mirror with the highest priority; 3. verify the file's size, if declared; 4. verify the file's checksum. All the above comform to the usual Wget's download behaviour and to the RFC6249 standard. References: wget(1) https://tools.ietf.org/html/rfc6249 6. Directory Options ******************** '-nd' '--no-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-x' '--force-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-nH' '--no-host-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '--protocol-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '--cut-dirs=number' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-P prefix' '--directory-prefix=prefix' Set the top of the retrieval tree to prefix for both Metalink/XML and Metalink/HTTP downloads, see wget(1). If combining the prefix with the file name results in an absolute, relative, or home path, the directory components are stripped and only the basename is used. See '4.1 Implemented safety features'.