mirror of
https://github.com/mirror/wget.git
synced 2025-03-27 20:40:27 +08:00
Bugfix: Fix NULL filename and output_stream in Metalink module
* NEWS: Mention the Metalink "path/file" name format handling * src/metalink.c (retrieve_from_metalink): Fix NULL filename, set filename to the right "path/file" value * src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set output_stream to filename when it is created by retrieve_url() * src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about proper metalink:file "path/file" name format handling * doc/metalink.txt: Update document. Remove resolved bugs If unique_create() cannot create/open the destination file, filename and output_stream remain NULL. If fopen() is used instead, filename always remains NULL. Both functions cannot create "path/file" trees. Setting filename to the right value is sufficient to prevent SIGSEGV generating from testing a NULL value. This also allows retrieve_url() to create a "path/file" tree through opt.output_document. Reading NULL as output_stream, when it shall not be, leads to wrong results. For instance, a non-NULL output_stream tells when a stream was interrupted, reading NULL instead means to assume the contrary. This patch conforms to the RFC5854 specification: The Metalink Download Description Format 4.1.2.1. The "name" Attribute https://tools.ietf.org/html/rfc5854#section-4.1.2.1
This commit is contained in:
parent
bcb9bf7ae4
commit
96554861f9
5
NEWS
5
NEWS
@ -9,6 +9,11 @@ Please send GNU Wget bug reports to <bug-wget@gnu.org>.
|
|||||||
|
|
||||||
* Changes in Wget X.Y.Z
|
* Changes in Wget X.Y.Z
|
||||||
|
|
||||||
|
* When processing a Metalink file, create the parent directories of a
|
||||||
|
"path/file" destination file name:
|
||||||
|
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
|
||||||
|
https://tools.ietf.org/html/rfc5854#section-4.2.8.3
|
||||||
|
|
||||||
* On a recursive download, append a .tmp suffix to temporary files
|
* On a recursive download, append a .tmp suffix to temporary files
|
||||||
that will be deleted after being parsed, and create them
|
that will be deleted after being parsed, and create them
|
||||||
readable/writable only by the owner.
|
readable/writable only by the owner.
|
||||||
|
139
doc/metalink.txt
139
doc/metalink.txt
@ -1,24 +1,26 @@
|
|||||||
GNU Wget Metalink module (--input-metalink)
|
GNU Wget Metalink module
|
||||||
|
|
||||||
Evaluation of "Directory Options" on the command line
|
Evaluation of the Metalink/XML and Metalink/HTTP implementations
|
||||||
|
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
***************
|
***************
|
||||||
|
|
||||||
This document, and the results contained in it, is focused over the
|
This document, and the results contained in it, is focused over the
|
||||||
testing of the metalink:file "path/file" name format.
|
evaluation of the Metalink/XML and Metalink/HTTP implementations.
|
||||||
|
|
||||||
The "Directory Options" mentioned here are used on the command line in
|
The "Directory Options" mentioned here are used on the command line in
|
||||||
conjunction with the option '--input-metalink=file':
|
conjunction with the option '--input-metalink=file' for Metalink/XML,
|
||||||
|
and '--metalink-over-http' for Metalink/HTTP.
|
||||||
|
|
||||||
$ wget --input-metalink=file <directory options>
|
$ wget --input-metalink=<file> [directory options]
|
||||||
|
$ wget --metalink-over-http [directory options] <url>
|
||||||
|
|
||||||
2. Notes
|
2. Notes
|
||||||
********
|
********
|
||||||
|
|
||||||
Tests containing a metalink:file "/path/file", "./path/file", or
|
Tests for metalink:file names beginning with '/', '~/', './', or '../'
|
||||||
"../path/file" name shall be run manually due to security concerns.
|
(e.g. "/path/file") shall be run manually due to security concerns.
|
||||||
|
|
||||||
3. Metalink files used as reference
|
3. Metalink files used as reference
|
||||||
***********************************
|
***********************************
|
||||||
@ -47,17 +49,30 @@ EOF
|
|||||||
4.1 Implemented safety features
|
4.1 Implemented safety features
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
Do not follow relative or absolute paths: "/path/file", "./path/file",
|
Any metalink:file name containing an absolute, relative, or home path
|
||||||
and "../path/file" as metalink:file name formats are all ignored (wget
|
(see '2. Notes') parsed from Metalink/XML files is rejected.
|
||||||
refuses to start). The options --trust-server-names changes nothing.
|
|
||||||
|
|
||||||
4.2 Actual behaviour
|
This is a libmetalink's design decision implemented in the function
|
||||||
====================
|
metalink_check_safe_path(). This feature shall not be modified.
|
||||||
|
|
||||||
Given a metalink:file "path/file" name, if "path" exists, download
|
All the above conform to the RFC5854 standard.
|
||||||
"path/file", then compute its checksum. If "path" doesn't exist,
|
|
||||||
download the url's file in the working directory; then the checksum
|
References:
|
||||||
fails: cannot find "path/file".
|
https://tools.ietf.org/html/rfc5854#section-4.1.2.1
|
||||||
|
https://tools.ietf.org/html/rfc5854#section-4.2.8.3
|
||||||
|
|
||||||
|
4.2 File download behaviour
|
||||||
|
===========================
|
||||||
|
|
||||||
|
When a Metalink/XML file is parsed:
|
||||||
|
1. create the metalink:file "path/file" tree;
|
||||||
|
2. download the metalink:url file as "path/file";
|
||||||
|
3. verify the "path/file" checksum.
|
||||||
|
|
||||||
|
All the above conform to the RFC5854 standard.
|
||||||
|
|
||||||
|
References:
|
||||||
|
https://tools.ietf.org/html/rfc5854
|
||||||
|
|
||||||
4.3 Questionable behaviours
|
4.3 Questionable behaviours
|
||||||
===========================
|
===========================
|
||||||
@ -69,69 +84,85 @@ If more metalink:file elements are the same, wget downloads them all.
|
|||||||
|
|
||||||
The download is OK even when metalink:file size is wrong.
|
The download is OK even when metalink:file size is wrong.
|
||||||
|
|
||||||
5. Directory Options
|
5. `wget --metalink-over-http`
|
||||||
|
******************************
|
||||||
|
|
||||||
|
5.1 Implemented safety features
|
||||||
|
===============================
|
||||||
|
|
||||||
|
The function url_file_name() is responsible of parsing the url's file
|
||||||
|
name and mixing in the "Directory Options" wrote on the command line.
|
||||||
|
|
||||||
|
The use of libmetalink's metalink_check_safe_path() shouldn't be
|
||||||
|
necessary (see '4.1 Implemented safety features').
|
||||||
|
|
||||||
|
All the above comform to the usual Wget's download behaviour.
|
||||||
|
|
||||||
|
References:
|
||||||
|
wget(1)
|
||||||
|
|
||||||
|
5.2 File download behaviour
|
||||||
|
===========================
|
||||||
|
|
||||||
|
When a Metalink/HTTP header is parsed:
|
||||||
|
1. extract metalink metadata from the header;
|
||||||
|
2. download the file from the mirror with the highest priority;
|
||||||
|
3. verify the file's checksum.
|
||||||
|
|
||||||
|
All the above comform to the usual Wget's download behaviour and to
|
||||||
|
the RFC6249 standard.
|
||||||
|
|
||||||
|
References:
|
||||||
|
wget(1)
|
||||||
|
https://tools.ietf.org/html/rfc6249
|
||||||
|
|
||||||
|
6. Directory Options
|
||||||
********************
|
********************
|
||||||
|
|
||||||
'-nd'
|
'-nd'
|
||||||
'--no-directories'
|
'--no-directories'
|
||||||
|
|
||||||
Used alone has no effect (see `wget --input-metalink=test.meta4`).
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
|
|
||||||
Used in conjunction with --recursive, given "path/file", if "path"
|
Apply to Metalink/HTTP urls as described in the Wget's manual, see
|
||||||
exists, download "path/file" and compute its checksum. If "path"
|
wget(1). The target url is the url wrote on the command line.
|
||||||
doesnt' exist, download the url's file in the working directory,
|
|
||||||
then the checksum fails: cannot find "path/file".
|
|
||||||
|
|
||||||
'-x'
|
'-x'
|
||||||
'--force-directories'
|
'--force-directories'
|
||||||
|
|
||||||
Given "path/file", if "path" exists, download "path/file", then
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
compute its checksum. If "path" doesn't exist, create the url
|
|
||||||
hierarchy, then the checksum fails: cannot find "path/file".
|
Apply to Metalink/HTTP urls as described in the Wget's manual, see
|
||||||
|
wget(1). The target url is the url wrote on the command line.
|
||||||
|
|
||||||
'-nH'
|
'-nH'
|
||||||
'--no-host-directories'
|
'--no-host-directories'
|
||||||
|
|
||||||
Given "path/file", if "path" exists, download "path/file", then
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
compute its checksum. If "path" doesn't exist, download the url's
|
|
||||||
file in the working directory, then the checksum fails: cannot
|
Apply to Metalink/HTTP urls as described in the Wget's manual, see
|
||||||
find "path/file"; in this context, if --force-directories is
|
wget(1). The target url is the url wrote on the command line.
|
||||||
present, create the url hierarchy omitting the host component.
|
|
||||||
|
|
||||||
'--protocol-directories'
|
'--protocol-directories'
|
||||||
|
|
||||||
Used alone has no effect (see `wget --input-metalink=test.meta4`).
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
|
|
||||||
In conjunction with --force-directories, use the protocol name as
|
Apply to Metalink/HTTP urls as described in the Wget's manual, see
|
||||||
the first directory component (see --force-directories).
|
wget(1). The target url is the url wrote on the command line.
|
||||||
|
|
||||||
'--cut-dirs=number'
|
'--cut-dirs=number'
|
||||||
|
|
||||||
Used alone has no effect (see `wget --input-metalink=test.meta4`).
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
|
|
||||||
In conjunction with --force-directories, ignore 'number' directory
|
Apply to Metalink/HTTP urls as described in the Wget's manual, see
|
||||||
components after the domain (see --force-directories).
|
wget(1). The target url is the url wrote on the command line.
|
||||||
|
|
||||||
'-P prefix'
|
'-P prefix'
|
||||||
'--directory-prefix=prefix'
|
'--directory-prefix=prefix'
|
||||||
|
|
||||||
This is buggy or non-intuitive.
|
Do not apply to Metalink/XML files (aka --input-metalink=<file>).
|
||||||
|
|
||||||
Given "path/file", and more metalink:url uris for the same file,
|
Apply to Metalink/HTTP downloads.
|
||||||
if '-P path' is specified, the first url's file is downloaded as
|
|
||||||
"path/<url_file>", and the second url's file as "path/file". The
|
|
||||||
first file fails the checksum: cannot find "path/file". The file
|
|
||||||
"path/file" passes the checksum verification.
|
|
||||||
|
|
||||||
Given "path/file", and more metalink:url uris for the same file,
|
The directory prefix is the directory where all other files and
|
||||||
if '-P newp' is specified, all the urls' files are downloaded as
|
subdirectories will be saved to, see wget(1).
|
||||||
"newp/<url_file>. A suffix counter is added to the file names to
|
|
||||||
not overwrite existing files. Then all the checksums fail: cannot
|
|
||||||
find "path/file".
|
|
||||||
|
|
||||||
Given "path/file", and more metalink:url uris for the same file,
|
|
||||||
if '-P ../path' is specified, the same things as if '-P ../newp'
|
|
||||||
or '-P newp' will happen, e.g. "newp/<url_file> and checksums
|
|
||||||
failures.
|
|
||||||
|
|
||||||
[write here more wrong things happening]
|
|
||||||
|
@ -170,7 +170,26 @@ retrieve_from_metalink (const metalink_t* metalink)
|
|||||||
|
|
||||||
output_stream_regular = true;
|
output_stream_regular = true;
|
||||||
|
|
||||||
/* Store the real file name for displaying in messages. */
|
/*
|
||||||
|
At this point, if output_stream is NULL, the file
|
||||||
|
couldn't be created/opened.
|
||||||
|
|
||||||
|
This happens when the metalink:file has a "path/file"
|
||||||
|
name format and its directory tree cannot be created:
|
||||||
|
* stdio.h (fopen)
|
||||||
|
* src/utils.c (unique_create)
|
||||||
|
|
||||||
|
RFC5854 requires a proper "path/file" format handling,
|
||||||
|
this can be achieved setting opt.output_document while
|
||||||
|
output_stream is left to NULL:
|
||||||
|
* src/http.c (open_output_stream): If output_stream is
|
||||||
|
NULL, create the opt.output_document "path/file"
|
||||||
|
*/
|
||||||
|
if (!filename)
|
||||||
|
filename = xstrdup (mfile->name);
|
||||||
|
|
||||||
|
/* Store the real file name for displaying in messages,
|
||||||
|
and for proper RFC5854 "path/file" handling. */
|
||||||
opt.output_document = filename;
|
opt.output_document = filename;
|
||||||
|
|
||||||
opt.metalink_over_http = false;
|
opt.metalink_over_http = false;
|
||||||
@ -178,6 +197,15 @@ retrieve_from_metalink (const metalink_t* metalink)
|
|||||||
retr_err = retrieve_url (url, mres->url, NULL, NULL,
|
retr_err = retrieve_url (url, mres->url, NULL, NULL,
|
||||||
NULL, NULL, opt.recursive, iri, false);
|
NULL, NULL, opt.recursive, iri, false);
|
||||||
opt.metalink_over_http = _metalink_http;
|
opt.metalink_over_http = _metalink_http;
|
||||||
|
|
||||||
|
/*
|
||||||
|
Bug: output_stream is NULL, but retrieve_url() somehow
|
||||||
|
created filename.
|
||||||
|
|
||||||
|
Bugfix: point output_stream to filename if it exists.
|
||||||
|
*/
|
||||||
|
if (!output_stream && file_exists_p (filename))
|
||||||
|
output_stream = fopen (filename, "ab");
|
||||||
}
|
}
|
||||||
url_free (url);
|
url_free (url);
|
||||||
iri_free (iri);
|
iri_free (iri);
|
||||||
|
Loading…
Reference in New Issue
Block a user