wget/doc/metalink.txt

166 lines
5.0 KiB
Plaintext
Raw Permalink Normal View History

GNU Wget Metalink module
Evaluation of the Metalink/XML and Metalink/HTTP implementations
1. Introduction
***************
This document, and the results contained in it, is focused over the
evaluation of the Metalink/XML and Metalink/HTTP implementations.
The "Directory Options" mentioned here are used on the command line in
conjunction with the option '--input-metalink=file' for Metalink/XML,
and '--metalink-over-http' for Metalink/HTTP.
$ wget --input-metalink=<file> [directory options]
$ wget --metalink-over-http [directory options] <url>
2. Notes
********
Tests for metalink:file names beginning with '/', '~/', './', or '../'
(e.g. "/path/file") shall be run manually due to security concerns.
3. Metalink files used as reference
***********************************
3.1 Test: metalink:file with "path/file" name format
====================================================
cat > test.meta4 << EOF
<?xml version="1.0" encoding="UTF-8"?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink">
<file name="path/file">
<size>543</size>
<hash type="sha256">d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e</hash>
<hash type="md5">0f6ff5cdc15603f1b81227b5a296f001</hash>
<url>http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig</url>
</file>
</metalink>
EOF
4. `wget --input-metalink=test.meta4`
*************************************
4.1 Implemented safety features
===============================
Any metalink:file name containing an absolute, relative, or home path
(see '2. Notes') parsed from Metalink/XML files is rejected.
This is a libmetalink's design decision implemented in the function
metalink_check_safe_path(). This feature shall not be modified.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
https://tools.ietf.org/html/rfc5854#section-4.2.8.3
4.2 File download behaviour
===========================
When a Metalink/XML file is parsed:
1. create the metalink:file "path/file" tree;
2. download the metalink:url file as "path/file";
3. verify the "path/file" size, if declared;
4. verify the "path/file" checksum.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854
4.3 Questionable behaviours
===========================
If more metalink:file elements are the same, wget downloads them all.
5. `wget --metalink-over-http`
******************************
5.1 Implemented safety features
===============================
The function url_file_name() is responsible of parsing the url's file
name and mixing in the "Directory Options" wrote on the command line.
The use of libmetalink's metalink_check_safe_path() shouldn't be
necessary (see '4.1 Implemented safety features').
All the above comform to the usual Wget's download behaviour.
References:
wget(1)
5.2 File download behaviour
===========================
When a Metalink/HTTP header is parsed:
1. extract metalink metadata from the header;
2. download the file from the mirror with the highest priority;
3. verify the file's size, if declared;
4. verify the file's checksum.
All the above comform to the usual Wget's download behaviour and to
the RFC6249 standard.
References:
wget(1)
https://tools.ietf.org/html/rfc6249
6. Directory Options
********************
'-nd'
'--no-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-x'
'--force-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-nH'
'--no-host-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--protocol-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--cut-dirs=number'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-P prefix'
'--directory-prefix=prefix'
Set the top of the retrieval tree to prefix for both Metalink/XML
and Metalink/HTTP downloads, see wget(1).
If combining the prefix with the file name results in an absolute,
relative, or home path, the directory components are stripped and
only the basename is used. See '4.1 Implemented safety features'.