GNU Wget Metalink module Evaluation of the Metalink/XML and Metalink/HTTP implementations 1. Introduction *************** This document, and the results contained in it, is focused over the evaluation of the Metalink/XML and Metalink/HTTP implementations. The "Directory Options" mentioned here are used on the command line in conjunction with the option '--input-metalink=file' for Metalink/XML, and '--metalink-over-http' for Metalink/HTTP. $ wget --input-metalink= [directory options] $ wget --metalink-over-http [directory options] 2. Notes ******** Tests for metalink:file names beginning with '/', '~/', './', or '../' (e.g. "/path/file") shall be run manually due to security concerns. 3. Metalink files used as reference *********************************** 3.1 Test: metalink:file with "path/file" name format ==================================================== cat > test.meta4 << EOF 543 d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e 0f6ff5cdc15603f1b81227b5a296f001 http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig EOF 4. `wget --input-metalink=test.meta4` ************************************* 4.1 Implemented safety features =============================== Any metalink:file name containing an absolute, relative, or home path (see '2. Notes') parsed from Metalink/XML files is rejected. This is a libmetalink's design decision implemented in the function metalink_check_safe_path(). This feature shall not be modified. All the above conform to the RFC5854 standard. References: https://tools.ietf.org/html/rfc5854#section-4.1.2.1 https://tools.ietf.org/html/rfc5854#section-4.2.8.3 4.2 File download behaviour =========================== When a Metalink/XML file is parsed: 1. create the metalink:file "path/file" tree; 2. download the metalink:url file as "path/file"; 3. verify the "path/file" size, if declared; 4. verify the "path/file" checksum. All the above conform to the RFC5854 standard. References: https://tools.ietf.org/html/rfc5854 4.3 Questionable behaviours =========================== If more metalink:file elements are the same, wget downloads them all. 5. `wget --metalink-over-http` ****************************** 5.1 Implemented safety features =============================== The function url_file_name() is responsible of parsing the url's file name and mixing in the "Directory Options" wrote on the command line. The use of libmetalink's metalink_check_safe_path() shouldn't be necessary (see '4.1 Implemented safety features'). All the above comform to the usual Wget's download behaviour. References: wget(1) 5.2 File download behaviour =========================== When a Metalink/HTTP header is parsed: 1. extract metalink metadata from the header; 2. download the file from the mirror with the highest priority; 3. verify the file's size, if declared; 4. verify the file's checksum. All the above comform to the usual Wget's download behaviour and to the RFC6249 standard. References: wget(1) https://tools.ietf.org/html/rfc6249 6. Directory Options ******************** '-nd' '--no-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-x' '--force-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-nH' '--no-host-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '--protocol-directories' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '--cut-dirs=number' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP urls as described in the Wget's manual, see wget(1). The target url is the url wrote on the command line. '-P prefix' '--directory-prefix=prefix' Do not apply to Metalink/XML files (aka --input-metalink=). Apply to Metalink/HTTP downloads. The directory prefix is the directory where all other files and subdirectories will be saved to, see wget(1).