wget/doc/metalink.txt
Matthew White 96554861f9 Bugfix: Fix NULL filename and output_stream in Metalink module
* NEWS: Mention the Metalink "path/file" name format handling
* src/metalink.c (retrieve_from_metalink): Fix NULL filename, set
  filename to the right "path/file" value
* src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set
  output_stream to filename when it is created by retrieve_url()
* src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about
  proper metalink:file "path/file" name format handling
* doc/metalink.txt: Update document. Remove resolved bugs

If unique_create() cannot create/open the destination file, filename
and output_stream remain NULL. If fopen() is used instead, filename
always remains NULL. Both functions cannot create "path/file" trees.

Setting filename to the right value is sufficient to prevent SIGSEGV
generating from testing a NULL value. This also allows retrieve_url()
to create a "path/file" tree through opt.output_document.

Reading NULL as output_stream, when it shall not be, leads to wrong
results. For instance, a non-NULL output_stream tells when a stream
was interrupted, reading NULL instead means to assume the contrary.

This patch conforms to the RFC5854 specification:
  The Metalink Download Description Format
  4.1.2.1.  The "name" Attribute
  https://tools.ietf.org/html/rfc5854#section-4.1.2.1
2016-09-27 20:17:08 +02:00

169 lines
4.9 KiB
Plaintext

GNU Wget Metalink module
Evaluation of the Metalink/XML and Metalink/HTTP implementations
1. Introduction
***************
This document, and the results contained in it, is focused over the
evaluation of the Metalink/XML and Metalink/HTTP implementations.
The "Directory Options" mentioned here are used on the command line in
conjunction with the option '--input-metalink=file' for Metalink/XML,
and '--metalink-over-http' for Metalink/HTTP.
$ wget --input-metalink=<file> [directory options]
$ wget --metalink-over-http [directory options] <url>
2. Notes
********
Tests for metalink:file names beginning with '/', '~/', './', or '../'
(e.g. "/path/file") shall be run manually due to security concerns.
3. Metalink files used as reference
***********************************
3.1 Test: metalink:file with "path/file" name format
====================================================
cat > test.meta4 << EOF
<?xml version="1.0" encoding="UTF-8"?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink">
<file name="path/file">
<size>543</size>
<hash type="sha256">d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e</hash>
<hash type="md5">0f6ff5cdc15603f1b81227b5a296f001</hash>
<url>http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig</url>
<url>http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig</url>
<url>http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig</url>
</file>
</metalink>
EOF
4. `wget --input-metalink=test.meta4`
*************************************
4.1 Implemented safety features
===============================
Any metalink:file name containing an absolute, relative, or home path
(see '2. Notes') parsed from Metalink/XML files is rejected.
This is a libmetalink's design decision implemented in the function
metalink_check_safe_path(). This feature shall not be modified.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
https://tools.ietf.org/html/rfc5854#section-4.2.8.3
4.2 File download behaviour
===========================
When a Metalink/XML file is parsed:
1. create the metalink:file "path/file" tree;
2. download the metalink:url file as "path/file";
3. verify the "path/file" checksum.
All the above conform to the RFC5854 standard.
References:
https://tools.ietf.org/html/rfc5854
4.3 Questionable behaviours
===========================
If more metalink:file elements are the same, wget downloads them all.
4.4 Bugs
========
The download is OK even when metalink:file size is wrong.
5. `wget --metalink-over-http`
******************************
5.1 Implemented safety features
===============================
The function url_file_name() is responsible of parsing the url's file
name and mixing in the "Directory Options" wrote on the command line.
The use of libmetalink's metalink_check_safe_path() shouldn't be
necessary (see '4.1 Implemented safety features').
All the above comform to the usual Wget's download behaviour.
References:
wget(1)
5.2 File download behaviour
===========================
When a Metalink/HTTP header is parsed:
1. extract metalink metadata from the header;
2. download the file from the mirror with the highest priority;
3. verify the file's checksum.
All the above comform to the usual Wget's download behaviour and to
the RFC6249 standard.
References:
wget(1)
https://tools.ietf.org/html/rfc6249
6. Directory Options
********************
'-nd'
'--no-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-x'
'--force-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-nH'
'--no-host-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--protocol-directories'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'--cut-dirs=number'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP urls as described in the Wget's manual, see
wget(1). The target url is the url wrote on the command line.
'-P prefix'
'--directory-prefix=prefix'
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
Apply to Metalink/HTTP downloads.
The directory prefix is the directory where all other files and
subdirectories will be saved to, see wget(1).