New option --metalink-index to process Metalink application/metalink4+xml

* NEWS: Mention the effect of --metalink-index over Metalink
* src/init.c: Add new option metalinkindex (opt.metalink_index),
  initialize to -1
* src/main.c: Add new option metalink-index (--metalink-index=NUMBER)
* src/options.h: Add new option metalink_index (int)
* src/metalink.h: Add declaration of functions fetch_metalink_file(),
  replace_metalink_basename()
* src/metalink.c: Add functions fetch_metalink_file() simple file
  fetch, replace_metalink_basename() replace file basename
* src/metalink.c (retrieve_from_metalink): New. Process Metalink
  application/metalink4+xml of opt.metalink_index ordinal number
* doc/wget.texi: Add new option metalink-index (--metalink-index)
  documentation
* doc/metalink-standard.txt: Updated doc. Add documentation about
  Metalink application/metalink4+xml metaurls download naming system
* doc/metalink-standard.txt: Update Metalink/XML and HTTP examples
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml.py: New file. Metalink/HTTP automated
  Metalink/XML "application/metalink4+xml" --metalink-index tests
* testenv/Test-metalink-http-xml-trust.py: New file. Metalink/HTTP
  automated Metalink/XML "application/metalink4+xml" --metalink-index
  retrieval with --trust-server-names tests

WARNING: Do not use lib/dirname.c (dir_name) to get the directory
name, it may append a dot '.' character to the directory name.
This commit is contained in:
Matthew White 2016-08-26 11:19:24 +02:00
parent acb1d1a668
commit 0538e791fb
11 changed files with 947 additions and 8 deletions

3
NEWS
View File

@ -9,6 +9,9 @@ Please send GNU Wget bug reports to <bug-wget@gnu.org>.
* Changes in Wget X.Y.Z
* When processing a Metalink header, --metalink-index=<number> allows
to process the header's application/metalink4+xml files.
* When processing a Metalink file, --trust-server-names enables the
use of the destination file names specified in the Metalink file,
otherwise a safe destination file name is computed.

View File

@ -65,10 +65,16 @@ ignored, see '1. Security features'.
====================================
When --trust-server-names is off, the basename of the --input-metalink
file, if available, or of the mother URL is trusted.
file, if available, or of the mother URL is trusted. This trusted name
is the radix of any subsequent file name.
When a Metalink/HTTP in encountered, any fetched Metalink/XML file has
its own ordinal number appended as suffix to the trusted name. In this
case scenario, an unique Metalink/XML file is saved each time applying
an additional suffix to the currently computed name when necessary.
The files described by a Metalink/XML file will be named sequentially
applying a suffix to the trusted name.
applying an additional suffix to the currently trusted/computed name.
3.1.1.2 With --trust-server-names
=================================
@ -91,6 +97,8 @@ found unsafe too, the file is not downloaded.
4.1 Example files
=================
See [1 #section-1.1].
cat > bugus.meta4 << EOF
<?xml version="1.0" encoding="UTF-8"?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink">
@ -134,8 +142,9 @@ be informed to the caller of libmetalink's metalink_parse_file().
Fetched metalink:file elements shall be wrote using the unique "name"
field as file name [1 #section-4.1.2.1].
A metalink:file url's file name shall not substitute the "name" field,
see '3. Download file name'.
A metalink:file url's file name shall not substitute the "name" field.
Security exceptions are explained in '3. Download file name'.
4.5 Multi-Source download
=========================
@ -160,9 +169,19 @@ $ wget --metalink-over-http http://127.0.0.1/dir/file.ext
5.3 Metalink/HTTP header answer
===============================
Link: http://ftpmirror.gnu.org/bash/bash-4.3-patches/bash43-001; rel=duplicate; pref; pri=2
Link: http://another.url/common_name; rel=duplicate; pref; pri=1
Digest: SHA-256=7LPf8mSGZ1E+MVVLOtBUzNifzjjjM2fJRZrDooUVN0I=
See [2 #section-1.1].
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: <http://www2.example.com/example.ext>; rel=duplicate
Link: <ftp://ftp.example.com/example.ext>; rel=duplicate
Link: <http://example.com/example.ext.torrent>; rel=describedby;
type="application/x-bittorrent"
Link: <http://example.com/example.ext.meta4>; rel=describedby;
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel=describedby;
type="application/pgp-signature"
Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==
5.4 Saving files
================
@ -173,7 +192,8 @@ purpose of the "Directory Options" is as usual, and the file name is
the cli's url file name, see wget(1).
The url followed to download the file shall not substitute the cli's
url to compute the file name to wrote, see '3. Download file name'.
url to compute the file name to wrote, except when it redirects to a
Metalink/XML file, following the rules in '3. Download file name'.
5.5 Multi-Source download
=========================

View File

@ -524,6 +524,15 @@ Issues HTTP HEAD request instead of GET and extracts Metalink metadata
from response headers. Then it switches to Metalink download.
If no valid Metalink metadata is found, it falls back to ordinary HTTP download.
@cindex metalink-index
@item --metalink-index=@var{number}
Set the Metalink @samp{application/metalink4+xml} metaurl ordinal
NUMBER. From 1 to the total number of ``application/metalink4+xml''
available. Specify 0 or @samp{inf} to choose the first good one.
Metaurls, such as those from a @samp{--metalink-over-http}, may have
been sorted by priority key's value; keep this in mind to choose the
right NUMBER.
@cindex preferred-location
@item --preferred-location
Set preferred location for Metalink resources. This has effect if multiple

View File

@ -248,6 +248,7 @@ static const struct {
{ "login", &opt.ftp_user, cmd_string },/* deprecated*/
{ "maxredirect", &opt.max_redirect, cmd_number },
#ifdef HAVE_METALINK
{ "metalinkindex", &opt.metalink_index, cmd_number_inf },
{ "metalinkoverhttp", &opt.metalink_over_http, cmd_boolean },
#endif
{ "method", &opt.method, cmd_string_uppercase },
@ -385,6 +386,10 @@ defaults (void)
bit pattern will be the least of the implementors' worries. */
xzero (opt);
#ifdef HAVE_METALINK
opt.metalink_index = -1;
#endif
opt.cookies = true;
opt.verbose = -1;
opt.ntry = 20;

View File

@ -354,6 +354,7 @@ static struct cmdline_option option_data[] =
{ "rejected-log", 0, OPT_VALUE, "rejectedlog", -1 },
{ "max-redirect", 0, OPT_VALUE, "maxredirect", -1 },
#ifdef HAVE_METALINK
{ "metalink-index", 0, OPT_VALUE, "metalinkindex", -1 },
{ "metalink-over-http", 0, OPT_BOOLEAN, "metalinkoverhttp", -1 },
#endif
{ "method", 0, OPT_VALUE, "method", -1 },
@ -713,6 +714,8 @@ Download:\n"),
#ifdef HAVE_METALINK
N_("\
--keep-badhash keep files with checksum mismatch (append .badhash)\n"),
N_("\
--metalink-index=NUMBER Metalink application/metalink4+xml metaurl ordinal NUMBER\n"),
N_("\
--metalink-over-http use Metalink metadata from HTTP response headers\n"),
N_("\

View File

@ -189,6 +189,184 @@ retrieve_from_metalink (const metalink_t* metalink)
}
}
/* Process the chosen application/metalink4+xml metaurl. */
if (opt.metalink_index >= 0)
{
int _metalink_index = opt.metalink_index;
metalink_metaurl_t **murl_ptr;
int abs_count = 0, meta_count = 0;
uerr_t x_retr_err = METALINK_MISSING_RESOURCE;
opt.metalink_index = -1;
DEBUGP (("Searching application/metalink4+xml ordinal number %d...\n", _metalink_index));
if (mfile->metaurls && mfile->metaurls[0])
for (murl_ptr = mfile->metaurls; *murl_ptr; murl_ptr++)
{
metalink_t* metaurl_xml;
metalink_error_t meta_err;
metalink_metaurl_t *murl = *murl_ptr;
char *_dir_prefix = opt.dir_prefix;
char *_input_metalink = opt.input_metalink;
char *metafile = NULL;
char *metadest = NULL;
char *metadir = NULL;
abs_count++;
if (strcmp (murl->mediatype, "application/metalink4+xml"))
continue;
meta_count++;
DEBUGP ((" Ordinal number %d: %s\n", meta_count, quote (murl->url)));
if (_metalink_index > 0)
{
if (meta_count < _metalink_index)
continue;
else if (meta_count > _metalink_index)
break;
}
logprintf (LOG_NOTQUIET,
_("Processing metaurl %s...\n"), quote (murl->url));
/* Metalink/XML download file name. */
metafile = xstrdup (safename);
if (opt.trustservernames)
replace_metalink_basename (&metafile, murl->url);
else
append_suffix_number (&metafile, ".meta#", meta_count);
if (!metalink_check_safe_path (metafile))
{
logprintf (LOG_NOTQUIET,
_("Rejecting metaurl file %s. Unsafe name.\n"),
quote (metafile));
xfree (metafile);
if (_metalink_index > 0)
break;
continue;
}
/* For security reasons, always save metalink metaurl
files as new unique files. Keep them on failure. */
x_retr_err = fetch_metalink_file (murl->url, false, false,
metafile, &metadest);
/* On failure, try the next metalink metaurl. */
if (x_retr_err != RETROK)
{
logprintf (LOG_VERBOSE,
_("Failed to download %s. Skipping metaurl.\n"),
quote (metadest ? metadest : metafile));
inform_exit_status (x_retr_err);
xfree (metadest);
xfree (metafile);
if (_metalink_index > 0)
break;
continue;
}
/* Parse Metalink/XML. */
meta_err = metalink_parse_file (metadest, &metaurl_xml);
/* On failure, try the next metalink metaurl. */
if (meta_err)
{
logprintf (LOG_NOTQUIET,
_("Unable to parse metaurl file %s.\n"), quote (metadest));
x_retr_err = METALINK_PARSE_ERROR;
inform_exit_status (x_retr_err);
xfree (metadest);
xfree (metafile);
if (_metalink_index > 0)
break;
continue;
}
/* We need to sort the resources if preferred location
was specified by the user. */
if (opt.preferred_location && opt.preferred_location[0])
{
metalink_file_t **x_mfile_ptr;
for (x_mfile_ptr = metaurl_xml->files; *x_mfile_ptr; x_mfile_ptr++)
{
metalink_resource_t **x_mres_ptr;
metalink_file_t *x_mfile = *x_mfile_ptr;
size_t mres_count = 0;
for (x_mres_ptr = x_mfile->resources; *x_mres_ptr; x_mres_ptr++)
mres_count++;
stable_sort (x_mfile->resources,
mres_count,
sizeof (metalink_resource_t *),
metalink_res_cmp);
}
}
/* Insert the current "Directory Options". */
if (metalink->origin)
{
/* WARNING: Do not use lib/dirname.c (dir_name) to
get the directory name, it may append a dot '.'
character to the directory name. */
metadir = xstrdup (planname);
replace_metalink_basename (&metadir, NULL);
}
else
{
metadir = xstrdup (opt.dir_prefix);
}
opt.dir_prefix = metadir;
opt.input_metalink = metadest;
x_retr_err = retrieve_from_metalink (metaurl_xml);
if (x_retr_err != RETROK)
logprintf (LOG_NOTQUIET,
_("Could not download all resources from %s.\n"),
quote (metadest));
metalink_delete (metaurl_xml);
metaurl_xml = NULL;
opt.input_metalink = _input_metalink;
opt.dir_prefix = _dir_prefix;
xfree (metadir);
xfree (metadest);
xfree (metafile);
break;
}
if (x_retr_err != RETROK)
logprintf (LOG_NOTQUIET, _("Metaurls processing returned with error.\n"));
xfree (destname);
xfree (filename);
xfree (trsrname);
xfree (planname);
opt.output_document = _output_document;
output_stream_regular = _output_stream_regular;
output_stream = _output_stream;
opt.metalink_index = _metalink_index;
return x_retr_err;
}
/* Resources are sorted by priority. */
for (mres_ptr = mfile->resources; *mres_ptr && !skip_mfile; mres_ptr++)
{
@ -739,6 +917,72 @@ gpg_skip_verification:
return last_retr_err;
}
/*
Replace/remove the basename of a file name.
The file name is permanently modified.
Always set NAME to a string, even an empty one.
Use REF's basename as replacement. If REF is NULL or if it doesn't
provide a valid basename candidate, then remove NAME's basename.
*/
void
replace_metalink_basename (char **name, char *ref)
{
int n;
char *p, *new, *basename;
if (!name)
return;
/* Strip old basename. */
if (*name)
{
basename = last_component (*name);
if (basename == *name)
xfree (*name);
else
*basename = '\0';
}
/* New basename from file name reference. */
if (ref)
ref = last_component (ref);
/* Replace the old basename. */
new = aprintf ("%s%s", *name ? *name : "", ref ? ref : "");
xfree (*name);
*name = new;
/* Remove prefix drive letters if required, i.e. when in w32
environments. */
p = new;
while (p[0] != '\0')
{
while ((n = FILE_SYSTEM_PREFIX_LEN (p)) > 0)
p += n;
if (p != new)
{
while (ISSLASH (p[0]))
++p;
new = p;
continue;
}
break;
}
if (*name != new)
{
new = xstrdup (new);
xfree (*name);
*name = new;
}
}
/*
Strip the directory components from the given name.
@ -891,6 +1135,110 @@ badhash_or_remove (char *name)
}
}
/*
Simple file fetch.
Set DESTNAME to the name of the saved file.
Resume previous download if RESUME is true. To disable
Metalink/HTTP, set METALINK_HTTP to false.
*/
uerr_t
fetch_metalink_file (const char *url_str,
bool resume, bool metalink_http,
const char *filename, char **destname)
{
FILE *_output_stream = output_stream;
bool _output_stream_regular = output_stream_regular;
char *_output_document = opt.output_document;
bool _metalink_http = opt.metalink_over_http;
char *local_file = NULL;
uerr_t retr_err = URLERROR;
struct iri *iri;
struct url *url;
int url_err;
/* Parse the URL. */
iri = iri_new ();
set_uri_encoding (iri, opt.locale, true);
url = url_parse (url_str, &url_err, iri, false);
if (!url)
{
char *error = url_error (url_str, url_err);
logprintf (LOG_NOTQUIET, "%s: %s.\n", url_str, error);
inform_exit_status (retr_err);
iri_free (iri);
xfree (error);
return retr_err;
}
output_stream = NULL;
if (resume)
/* continue previous download */
output_stream = fopen (filename, "ab");
else
/* create a file with an unique name */
output_stream = unique_create (filename, true, &local_file);
output_stream_regular = true;
/*
If output_stream is NULL, the file couldn't be created/opened.
This could be due to the impossibility to create a directory tree:
* stdio.h (fopen)
* src/utils.c (unique_create)
A call to retrieve_url() can indirectly create a directory tree,
when opt.output_document is set to the destination file name and
output_stream is left to NULL:
* src/http.c (open_output_stream): If output_stream is NULL,
create the destination opt.output_document "path/file"
*/
if (!local_file)
local_file = xstrdup (filename);
/* Store the real file name for displaying in messages, and for
proper "path/file" handling. */
opt.output_document = local_file;
opt.metalink_over_http = metalink_http;
DEBUGP (("Storing to %s\n", local_file));
retr_err = retrieve_url (url, url_str, NULL, NULL,
NULL, NULL, opt.recursive, iri, false);
if (retr_err == RETROK)
{
if (destname)
*destname = local_file;
else
xfree (local_file);
}
if (output_stream)
{
fclose (output_stream);
output_stream = NULL;
}
opt.metalink_over_http = _metalink_http;
opt.output_document = _output_document;
output_stream_regular = _output_stream_regular;
output_stream = _output_stream;
inform_exit_status (retr_err);
iri_free (iri);
url_free (url);
return retr_err;
}
int metalink_res_cmp (const void* v1, const void* v2)
{
const metalink_resource_t *res1 = *(metalink_resource_t **) v1,

View File

@ -51,12 +51,16 @@ int metalink_meta_cmp (const void* meta1, const void* meta2);
int metalink_check_safe_path (const char *path);
char *last_component (char const *name);
void replace_metalink_basename (char **name, char *ref);
char *get_metalink_basename (char *name);
void append_suffix_number (char **str, const char *sep, wgint num);
void clean_metalink_string (char **str);
void dequote_metalink_string (char **str);
void badhash_suffix (char *name);
void badhash_or_remove (char *name);
uerr_t fetch_metalink_file (const char *url_str,
bool resume, bool metalink_http,
const char *filename, char **destname);
bool find_key_value (const char *start,
const char *end,

View File

@ -67,6 +67,7 @@ struct options
char *input_filename; /* Input filename */
#ifdef HAVE_METALINK
char *input_metalink; /* Input metalink file */
int metalink_index; /* Metalink application/metalink4+xml metaurl ordinal number. */
bool metalink_over_http; /* Use Metalink if present in HTTP response */
char *preferred_location; /* Preferred location for Metalink resources */
#endif

View File

@ -29,6 +29,8 @@
if METALINK_IS_ENABLED
METALINK_TESTS = Test-metalink-http.py \
Test-metalink-http-quoted.py \
Test-metalink-http-xml.py \
Test-metalink-http-xml-trust.py \
Test-metalink-xml.py \
Test-metalink-xml-continue.py \
Test-metalink-xml-relpath.py \

View File

@ -0,0 +1,272 @@
#!/usr/bin/env python3
from sys import exit
from test.http_test import HTTPTest
from misc.wget_file import WgetFile
import hashlib
from base64 import b64encode
"""
This is to test Metalink/HTTP with Metalink/XML Link headers.
With --trust-server-names, trust the metalink:file names.
Without --trust-server-names, don't trust the metalink:file names:
use the basename of --input-metalink, and add a sequential number
(e.g. .#1, .#2, etc.).
Strip the directory from unsafe paths.
"""
############# File Definitions ###############################################
bad = "Ouch!"
bad_sha256 = hashlib.sha256 (bad.encode ('UTF-8')).hexdigest ()
File1 = "Would you like some Tea?"
File1_lowPref = "Do not take this"
File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
File2 = "This is gonna be good"
File2_lowPref = "Not this one too"
File2_sha256 = hashlib.sha256 (File2.encode ('UTF-8')).hexdigest ()
File3 = "A little more, please"
File3_lowPref = "That's just too much"
File3_sha256 = hashlib.sha256 (File3.encode ('UTF-8')).hexdigest ()
File4 = "Maybe a biscuit?"
File4_lowPref = "No, thanks"
File4_sha256 = hashlib.sha256 (File4.encode ('UTF-8')).hexdigest ()
File5 = "More Tea...?"
File5_lowPref = "I have to go..."
File5_sha256 = hashlib.sha256 (File5.encode ('UTF-8')).hexdigest ()
MetaXml1 = \
"""<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/">
<publisher>
<name>GNU Wget</name>
</publisher>
<license>
<name>GNU GPL</name>
<url>http://www.gnu.org/licenses/gpl.html</url>
</license>
<identity>Wget Test Files</identity>
<version>1.2.3</version>
<description>Wget Test Files description</description>
<files>
<file name="dir/File1">
<verification>
<hash type="sha256">{{FILE1_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File1_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File1</url>
</resources>
</file>
<file name="dir/File2">
<verification>
<hash type="sha256">{{FILE2_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File2_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File2</url>
</resources>
</file>
<file name="/dir/File3"> <!-- rejected by libmetalink -->
<verification>
<hash type="sha256">{{FILE3_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File3_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File3</url>
</resources>
</file>
<file name="dir/File4">
<verification>
<hash type="sha256">{{FILE4_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File4_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File4</url>
</resources>
</file>
<file name="dir/File5">
<verification>
<hash type="sha256">{{FILE5_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File5_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File5</url>
</resources>
</file>
</files>
</metalink>
"""
MetaXml2 = \
"""<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/">
<publisher>
<name>GNU Wget</name>
</publisher>
<license>
<name>GNU GPL</name>
<url>http://www.gnu.org/licenses/gpl.html</url>
</license>
<identity>Wget Test Files</identity>
<version>1.2.3</version>
<description>Wget Test Files description</description>
<files>
<file name="bad">
<verification>
<hash type="sha256">{{BAD_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/bad</url>
</resources>
</file>
</files>
</metalink>
"""
LinkHeaders = [
# This file has the lowest priority, and should go last
"<http://{{SRV_HOST}}:{{SRV_PORT}}/test1.metalink>; rel=describedby; pri=2; type=\"application/metalink4+xml\"",
# This file has the highest priority, and should go first
"<http://{{SRV_HOST}}:{{SRV_PORT}}/test2.metalink>; rel=describedby; pri=1; type=\"application/metalink4+xml\""
]
# This will be filled as soon as we know server hostname and port
MetaHTTPRules = {'SendHeader' : {}}
MetaHTTP = WgetFile ("main.metalink", rules=MetaHTTPRules)
wrong_file = WgetFile ("wrong_file", bad)
File1_orig = WgetFile ("File1", File1)
File1_down = WgetFile ("dir/File1", File1)
File1_nono = WgetFile ("File1_lowPref", File1_lowPref)
File2_orig = WgetFile ("File2", File2)
File2_down = WgetFile ("dir/File2", File2)
File2_nono = WgetFile ("File2_lowPref", File2_lowPref)
# rejected by libmetalink
File3_orig = WgetFile ("File3", File3)
File3_nono = WgetFile ("File3_lowPref", File3_lowPref)
File4_orig = WgetFile ("File4", File4)
File4_down = WgetFile ("dir/File4", File4)
File4_nono = WgetFile ("File4_lowPref", File4_lowPref)
File5_orig = WgetFile ("File5", File5)
File5_down = WgetFile ("dir/File5", File5)
File5_nono = WgetFile ("File5_lowPref", File5_lowPref)
MetaFile1 = WgetFile ("test1.metalink", MetaXml1)
MetaFile1_down = WgetFile ("test1.metalink", MetaXml1)
MetaFile2 = WgetFile ("test2.metalink", MetaXml2)
WGET_OPTIONS = "--trust-server-names --metalink-over-http --metalink-index=2"
WGET_URLS = [["main.metalink"]]
RequestList = [[
"HEAD /main.metalink",
"GET /404",
"GET /wrong_file",
"GET /test1.metalink",
"GET /File1",
"GET /File2",
"GET /File4",
"GET /File5"
]]
Files = [[
MetaHTTP,
wrong_file,
MetaFile1, MetaFile2,
File1_orig, File1_nono,
File2_orig, File2_nono,
File3_orig, File3_nono,
File4_orig, File4_nono,
File5_orig, File5_nono
]]
Existing_Files = []
ExpectedReturnCode = 0
ExpectedDownloadedFiles = [
MetaFile1_down,
File1_down,
File2_down,
File4_down,
File5_down
]
################ Pre and Post Test Hooks #####################################
pre_test = {
"ServerFiles" : Files,
"LocalFiles" : Existing_Files
}
test_options = {
"WgetCommands" : WGET_OPTIONS,
"Urls" : WGET_URLS
}
post_test = {
"ExpectedFiles" : ExpectedDownloadedFiles,
"ExpectedRetcode" : ExpectedReturnCode,
"FilesCrawled" : RequestList
}
http_test = HTTPTest (
pre_hook=pre_test,
test_params=test_options,
post_hook=post_test
)
http_test.server_setup()
### Get and use dynamic server sockname
srv_host, srv_port = http_test.servers[0].server_inst.socket.getsockname ()
MetaXml1 = MetaXml1.replace('{{FILE1_HASH}}', File1_sha256)
MetaXml1 = MetaXml1.replace('{{FILE2_HASH}}', File2_sha256)
MetaXml1 = MetaXml1.replace('{{FILE3_HASH}}', File3_sha256)
MetaXml1 = MetaXml1.replace('{{FILE4_HASH}}', File4_sha256)
MetaXml1 = MetaXml1.replace('{{FILE5_HASH}}', File5_sha256)
MetaXml1 = MetaXml1.replace('{{SRV_HOST}}', srv_host)
MetaXml1 = MetaXml1.replace('{{SRV_PORT}}', str (srv_port))
MetaFile1.content = MetaXml1
MetaFile1_down.content = MetaXml1
MetaXml2 = MetaXml2.replace('{{BAD_HASH}}', bad_sha256)
MetaXml2 = MetaXml2.replace('{{SRV_HOST}}', srv_host)
MetaXml2 = MetaXml2.replace('{{SRV_PORT}}', str (srv_port))
MetaFile2.content = MetaXml2
# Helper function for hostname, port and digest substitution
def SubstituteServerInfo (text, host, port):
text = text.replace('{{SRV_HOST}}', host)
text = text.replace('{{SRV_PORT}}', str (port))
return text
MetaHTTPRules["SendHeader"] = {
'Link': [ SubstituteServerInfo (LinkHeader, srv_host, srv_port)
for LinkHeader in LinkHeaders ]
}
err = http_test.begin ()
exit (err)

272
testenv/Test-metalink-http-xml.py Executable file
View File

@ -0,0 +1,272 @@
#!/usr/bin/env python3
from sys import exit
from test.http_test import HTTPTest
from misc.wget_file import WgetFile
import hashlib
from base64 import b64encode
"""
This is to test Metalink/HTTP with Metalink/XML Link headers.
With --trust-server-names, trust the metalink:file names.
Without --trust-server-names, don't trust the metalink:file names:
use the basename of --input-metalink, and add a sequential number
(e.g. .#1, .#2, etc.).
Strip the directory from unsafe paths.
"""
############# File Definitions ###############################################
bad = "Ouch!"
bad_sha256 = hashlib.sha256 (bad.encode ('UTF-8')).hexdigest ()
File1 = "Would you like some Tea?"
File1_lowPref = "Do not take this"
File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
File2 = "This is gonna be good"
File2_lowPref = "Not this one too"
File2_sha256 = hashlib.sha256 (File2.encode ('UTF-8')).hexdigest ()
File3 = "A little more, please"
File3_lowPref = "That's just too much"
File3_sha256 = hashlib.sha256 (File3.encode ('UTF-8')).hexdigest ()
File4 = "Maybe a biscuit?"
File4_lowPref = "No, thanks"
File4_sha256 = hashlib.sha256 (File4.encode ('UTF-8')).hexdigest ()
File5 = "More Tea...?"
File5_lowPref = "I have to go..."
File5_sha256 = hashlib.sha256 (File5.encode ('UTF-8')).hexdigest ()
MetaXml1 = \
"""<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/">
<publisher>
<name>GNU Wget</name>
</publisher>
<license>
<name>GNU GPL</name>
<url>http://www.gnu.org/licenses/gpl.html</url>
</license>
<identity>Wget Test Files</identity>
<version>1.2.3</version>
<description>Wget Test Files description</description>
<files>
<file name="dir/File1">
<verification>
<hash type="sha256">{{FILE1_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File1_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File1</url>
</resources>
</file>
<file name="dir/File2">
<verification>
<hash type="sha256">{{FILE2_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File2_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File2</url>
</resources>
</file>
<file name="/dir/File3"> <!-- rejected by libmetalink -->
<verification>
<hash type="sha256">{{FILE3_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File3_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File3</url>
</resources>
</file>
<file name="dir/File4">
<verification>
<hash type="sha256">{{FILE4_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File4_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File4</url>
</resources>
</file>
<file name="dir/File5">
<verification>
<hash type="sha256">{{FILE5_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File5_lowPref</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File5</url>
</resources>
</file>
</files>
</metalink>
"""
MetaXml2 = \
"""<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/">
<publisher>
<name>GNU Wget</name>
</publisher>
<license>
<name>GNU GPL</name>
<url>http://www.gnu.org/licenses/gpl.html</url>
</license>
<identity>Wget Test Files</identity>
<version>1.2.3</version>
<description>Wget Test Files description</description>
<files>
<file name="bad">
<verification>
<hash type="sha256">{{BAD_HASH}}</hash>
</verification>
<resources>
<url type="http" preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
<url type="http" preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
<url type="http" preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/bad</url>
</resources>
</file>
</files>
</metalink>
"""
LinkHeaders = [
# This file has the lowest priority, and should go last
"<http://{{SRV_HOST}}:{{SRV_PORT}}/test1.metalink>; rel=describedby; pri=2; type=\"application/metalink4+xml\"",
# This file has the highest priority, and should go first
"<http://{{SRV_HOST}}:{{SRV_PORT}}/test2.metalink>; rel=describedby; pri=1; type=\"application/metalink4+xml\""
]
# This will be filled as soon as we know server hostname and port
MetaHTTPRules = {'SendHeader' : {}}
MetaHTTP = WgetFile ("main.metalink", rules=MetaHTTPRules)
wrong_file = WgetFile ("wrong_file", bad)
File1_orig = WgetFile ("File1", File1)
File1_down = WgetFile ("main.metalink.meta#2.#1", File1)
File1_nono = WgetFile ("File1_lowPref", File1_lowPref)
File2_orig = WgetFile ("File2", File2)
File2_down = WgetFile ("main.metalink.meta#2.#2", File2)
File2_nono = WgetFile ("File2_lowPref", File2_lowPref)
# rejected by libmetalink
File3_orig = WgetFile ("File3", File3)
File3_nono = WgetFile ("File3_lowPref", File3_lowPref)
File4_orig = WgetFile ("File4", File4)
File4_down = WgetFile ("main.metalink.meta#2.#3", File4)
File4_nono = WgetFile ("File4_lowPref", File4_lowPref)
File5_orig = WgetFile ("File5", File5)
File5_down = WgetFile ("main.metalink.meta#2.#4", File5)
File5_nono = WgetFile ("File5_lowPref", File5_lowPref)
MetaFile1 = WgetFile ("test1.metalink", MetaXml1)
MetaFile1_down = WgetFile ("main.metalink.meta#2", MetaXml1)
MetaFile2 = WgetFile ("test2.metalink", MetaXml2)
WGET_OPTIONS = "--metalink-over-http --metalink-index=2"
WGET_URLS = [["main.metalink"]]
RequestList = [[
"HEAD /main.metalink",
"GET /404",
"GET /wrong_file",
"GET /test1.metalink",
"GET /File1",
"GET /File2",
"GET /File4",
"GET /File5"
]]
Files = [[
MetaHTTP,
wrong_file,
MetaFile1, MetaFile2,
File1_orig, File1_nono,
File2_orig, File2_nono,
File3_orig, File3_nono,
File4_orig, File4_nono,
File5_orig, File5_nono
]]
Existing_Files = []
ExpectedReturnCode = 0
ExpectedDownloadedFiles = [
MetaFile1_down,
File1_down,
File2_down,
File4_down,
File5_down
]
################ Pre and Post Test Hooks #####################################
pre_test = {
"ServerFiles" : Files,
"LocalFiles" : Existing_Files
}
test_options = {
"WgetCommands" : WGET_OPTIONS,
"Urls" : WGET_URLS
}
post_test = {
"ExpectedFiles" : ExpectedDownloadedFiles,
"ExpectedRetcode" : ExpectedReturnCode,
"FilesCrawled" : RequestList
}
http_test = HTTPTest (
pre_hook=pre_test,
test_params=test_options,
post_hook=post_test
)
http_test.server_setup()
### Get and use dynamic server sockname
srv_host, srv_port = http_test.servers[0].server_inst.socket.getsockname ()
MetaXml1 = MetaXml1.replace('{{FILE1_HASH}}', File1_sha256)
MetaXml1 = MetaXml1.replace('{{FILE2_HASH}}', File2_sha256)
MetaXml1 = MetaXml1.replace('{{FILE3_HASH}}', File3_sha256)
MetaXml1 = MetaXml1.replace('{{FILE4_HASH}}', File4_sha256)
MetaXml1 = MetaXml1.replace('{{FILE5_HASH}}', File5_sha256)
MetaXml1 = MetaXml1.replace('{{SRV_HOST}}', srv_host)
MetaXml1 = MetaXml1.replace('{{SRV_PORT}}', str (srv_port))
MetaFile1.content = MetaXml1
MetaFile1_down.content = MetaXml1
MetaXml2 = MetaXml2.replace('{{BAD_HASH}}', bad_sha256)
MetaXml2 = MetaXml2.replace('{{SRV_HOST}}', srv_host)
MetaXml2 = MetaXml2.replace('{{SRV_PORT}}', str (srv_port))
MetaFile2.content = MetaXml2
# Helper function for hostname, port and digest substitution
def SubstituteServerInfo (text, host, port):
text = text.replace('{{SRV_HOST}}', host)
text = text.replace('{{SRV_PORT}}', str (port))
return text
MetaHTTPRules["SendHeader"] = {
'Link': [ SubstituteServerInfo (LinkHeader, srv_host, srv_port)
for LinkHeader in LinkHeaders ]
}
err = http_test.begin ()
exit (err)