wget/doc/metalink.txt
Matthew White 43ec7008f2 Enforce Metalink file name verification, strip directory if necessary
* NEWS: Mention the use of a safe Metalink destination path
* src/metalink.h: Add declaration of functions get_metalink_basename(),
  last_component(), metalink_check_safe_path()
* src/metalink.c: Add directive #include "dosname.h"
* src/metalink.c: Add function get_metalink_basename() to return the
  basename of a file name, strip w32's drive letter prefixes
* src/metalink.c (retrieve_from_metalink): Enforce Metalink file name
  verification, if the file name is unsafe try its basename
* doc/metalink.txt: Update document. Explain --directory-prefix

The function get_metalink_basename() uses FILE_SYSTEM_PREFIX_LEN to
catch any 'C:D:file' (w32 environment), then it removes each drive
letter prefix, i.e. 'C:' and 'D:'.

Unsafe file names contain an absolute, relative, or home path.  Safe
paths can be verified by libmetalink's metalink_check_safe_path().
2016-09-30 19:44:03 +02:00

166 lines
5.0 KiB
Plaintext

GNU Wget Metalink module
Evaluation of the Metalink/XML and Metalink/HTTP implementations
1. Introduction
***************
This document, and the results contained in it, is focused over the
evaluation of the Metalink/XML and Metalink/HTTP implementations.
The "Directory Options" mentioned here are used on the command line in
conjunction with the option '--input-metalink=file' for Metalink/XML,
and '--metalink-over-http' for Metalink/HTTP.
$ wget --input-metalink=<file> [directory options]
$ wget --metalink-over-http [directory options] <url>
2. Notes
********
Tests for metalink:file names beginning with '/', '~/', './', or '../'
(e.g. "/path/file") shall be run manually due to security concerns.
3. Metalink files used as reference
***********************************
3.1 Test: metalink:file with "path/file" name format
====================================================
cat > test.meta4 << EOF
<?xml version="1.0" encoding="UTF-8"?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink">
<file name="path/file">
<size>543</size>
<hash type="sha256">d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e</hash>
<hash type="md5">0f6ff5cdc15603f1b81227b5a296f001</hash>
<url>http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig</url>
</file>
</metalink>
EOF
4. `wget --input-metalink=test.meta4`
*************************************
4.1 Implemented safety features
===============================
Any metalink:file name containing an absolute, relative, or home path
(see '2. Notes') parsed from Metalink/XML files is rejected.
This is a libmetalink's design decision implemented in the function
metalink_check_safe_path(). This feature shall not be modified.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
https://tools.ietf.org/html/rfc5854#section-4.2.8.3
4.2 File download behaviour
===========================
When a Metalink/XML file is parsed:
1. create the metalink:file "path/file" tree;
2. download the metalink:url file as "path/file";
3. verify the "path/file" size, if declared;
4. verify the "path/file" checksum.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854
4.3 Questionable behaviours
===========================
If more metalink:file elements are the same, wget downloads them all.
5. `wget --metalink-over-http`
******************************
5.1 Implemented safety features
===============================
The function url_file_name() is responsible of parsing the url's file
name and mixing in the "Directory Options" wrote on the command line.
The use of libmetalink's metalink_check_safe_path() shouldn't be
necessary (see '4.1 Implemented safety features').
All the above comform to the usual Wget's download behaviour.
References:
wget(1)
5.2 File download behaviour
===========================
When a Metalink/HTTP header is parsed:
1. extract metalink metadata from the header;
2. download the file from the mirror with the highest priority;
3. verify the file's size, if declared;
4. verify the file's checksum.
All the above comform to the usual Wget's download behaviour and to
the RFC6249 standard.
References:
wget(1)
https://tools.ietf.org/html/rfc6249
6. Directory Options
********************
'-nd'
'--no-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-x'
'--force-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-nH'
'--no-host-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--protocol-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--cut-dirs=number'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-P prefix'
'--directory-prefix=prefix'
Set the top of the retrieval tree to prefix for both Metalink/XML
and Metalink/HTTP downloads, see wget(1).
If combining the prefix with the file name results in an absolute,
relative, or home path, the directory components are stripped and
only the basename is used. See '4.1 Implemented safety features'.