Commit Graph

3614 Commits

Author SHA1 Message Date
Matthew White
ff444ebc2a Bugfix: Keep the download progress when alternating metalink:url
* NEWS: Mention the effects of --continue over Metalink
* src/metalink.c (retrieve_from_metalink): On download error, resume
  output_stream with the next mres->url. Keep fully downloaded files
  started with --continue, otherwise rename/remove the file
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-continue.py: New file. Metalink/XML
  continue/keep existing files (HTTP 416) with --continue tests

Before this patch, with --continue, existing and/or fully retrieved
files which fail the sanity tests were renamed (--keep-badhash), or
removed.

This patch ensures that --continue doesn't rename/remove existing
and/or fully retrieved files (HTTP 416) which fail the sanity tests.
2016-09-27 20:28:50 +02:00
Matthew White
96554861f9 Bugfix: Fix NULL filename and output_stream in Metalink module
* NEWS: Mention the Metalink "path/file" name format handling
* src/metalink.c (retrieve_from_metalink): Fix NULL filename, set
  filename to the right "path/file" value
* src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set
  output_stream to filename when it is created by retrieve_url()
* src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about
  proper metalink:file "path/file" name format handling
* doc/metalink.txt: Update document. Remove resolved bugs

If unique_create() cannot create/open the destination file, filename
and output_stream remain NULL. If fopen() is used instead, filename
always remains NULL. Both functions cannot create "path/file" trees.

Setting filename to the right value is sufficient to prevent SIGSEGV
generating from testing a NULL value. This also allows retrieve_url()
to create a "path/file" tree through opt.output_document.

Reading NULL as output_stream, when it shall not be, leads to wrong
results. For instance, a non-NULL output_stream tells when a stream
was interrupted, reading NULL instead means to assume the contrary.

This patch conforms to the RFC5854 specification:
  The Metalink Download Description Format
  4.1.2.1.  The "name" Attribute
  https://tools.ietf.org/html/rfc5854#section-4.1.2.1
2016-09-27 20:17:08 +02:00
Matthew White
bcb9bf7ae4 Add metalink description
* doc/metalink.txt

Evaluation of "Directory Options" on the command line interacting with
the option '--input-metalink=file':

$ wget --input-metalink=file <directory options>
2016-09-19 05:56:40 +02:00
Matthew White
ea038006d1 Use python .replace instead than re.sub in Metalink tests
* testenv/Test-metalink-http.py: Use python .replace
* testenv/Test-metalink-xml.py: Use python .replace
* testenv/Test-metalink-xml-abspath.py: Use python .replace
* testenv/Test-metalink-xml-relpath.py: Use python .replace

Use python .replace instead than re.sub, remove 'import re'.
2016-09-19 05:56:40 +02:00
Matthew White
77e2c54991 Fix: Change Metalink/XML v3 file name into test.metalink
* testenv/Test-metalink-xml-abspath.py: Change Metalink/XML v3 file
  name from test.meta4 into test.metalink
* testenv/Test-metalink-xml-relpath.py: Change Metalink/XML v3 file
  name from test.meta4 into test.metalink
* testenv/Test-metalink-xml.py: Change Metalink/XML v3 file name from
  test.meta4 into test.metalink
2016-09-19 05:55:04 +02:00
Tim Rühsen
7ffbccec4c Add two Metalink/XML tests
* testenv/Test-metalink-xml-abspath.py: Reject absolute paths
* testenv/Test-metalink-xml-relpath.py: Reject relative paths
* testenv/Makefile.am: Add both new files to metalink tests
2016-09-13 11:45:02 +02:00
Tim Rühsen
a2c4849900 Fix crash on 'srcset' inline URIs
* src/html-url.c (tag_handle_img): Check append_url() for NULL
  return value before dereference.

Crashed reproducable with parsing srcset="data:..." inline data.
Reported-by: Coverity
2016-09-09 11:44:02 +02:00
Tim Rühsen
40870e1271 * src/hsts.c (hsts_store_open): NULL check param for fclose().
Reported-by: Coverity
2016-09-09 10:22:58 +02:00
Tim Rühsen
15c1e0eb7b * src/ftp-ls.c (ftp_parse_winnt_ls): Fix memset params 2016-09-09 10:22:58 +02:00
Tim Rühsen
eba724a128 * src/utils.c (stable_sort): Use xmalloc instead of malloc 2016-09-09 10:22:58 +02:00
Tim Rühsen
66a9883c8f * src/ftp-ls.c (ftp_parse_winnt_ls): Initialize struct fileinfo cur
Reported-by: Coverity
2016-09-08 16:51:15 +02:00
Tim Rühsen
4febe72bd2 Add const to url param of some functions
* src/http.c: Add const to first param of initialize_request(),
  initialize_proxy_configuration(), establish_connection(),
  check_file_output(), check_auth(), gethttp(), http_loop().
* src/http.h: Add const to first param of http_loop().
2016-09-08 16:13:54 +02:00
Tim Rühsen
c629ec7fd1 * Makefile.am: Remove trailing empty line 2016-09-08 14:20:33 +02:00
Tim Rühsen
03da900c5b * src/recur.c (retrieve_tree): Fix possible NULL dereference
Reported-by: Coverity
2016-09-08 13:04:37 +02:00
Tim Rühsen
b7b67e23cd * src/http.c (initialize_request): Fix check for user
Reported-by: Coverity
2016-09-08 12:48:32 +02:00
Tim Rühsen
22aed3ed4b * src/retr.c (retrieve_url): NULL check mynewloc
Reported-by: Coverity
2016-09-08 12:46:25 +02:00
Tim Rühsen
b4465afa8a * src/utils.c (stable_sort): Reduce tmp allocation size
Reported-by: Coverity
2016-09-08 12:44:17 +02:00
Tim Rühsen
a232835fd1 * Makefile.am: Add target 'check-valgrind' 2016-09-08 11:20:25 +02:00
Tim Rühsen
a78b83b1e9 Fix some issues detected by Coverity
* src/connect.c (connect_to_ip): Check return value of setsockopt.
* src/ftp.c (ftp_retrieve_list): Check return value of chmod.
* src/http.c (digest_authentication_encode): Cleanup code.
* src/init.c (setval_internal): Explicitely check comind range.
* src/main.c (main): Explicitely check optarg.
* src/retr.c (retr_rate): Use snprintf instead sprintf,
  (retrieve_from_file): More verbose error message,
  (rotate_backups): Use snprintf instead sprintf, check return
  value of rename().
* src/url.c (mkalldirs): Check return value of unlink().
* src/utils.c (strdupdelim): Explicitely check beg and end for NULL,
  (merge_vecs): Fix sizeof argument to char *,
  (stable_sort): Use malloc instead of alloca.
2016-09-08 10:12:02 +02:00
Tim Rühsen
37a5257c66 Code cleanup for --use-askpass
* bootstrap.conf: Add xmemdup0 and strpbrk.
* src/init.c (cmd_use_askpass): Add 'const' to char *,
  remove check for file existence.
* src/main.c (run_use_askpass): C89 compat init of argv,
  added \n to error messages,
  fixed stripping of \n and \r from input,
  make run_use_askpass and use_askpass static.
2016-09-08 09:07:32 +02:00
Tim Rühsen
49af22ca94 * src/http.c (check_file_output): Replace asprintf by aprint 2016-09-07 09:31:43 +02:00
Tim Rühsen
d505714a32 * testenv/README: Remove obsolete references to TEST_NAME 2016-09-04 14:56:06 +02:00
Liam R. Howlett
21e1725e12 Add --use-askpass=COMMAND support
* doc/wget.texi: Add --use-askpass to documentation.
* src/init.c: Add cmd_use_askpasss to set opt.use_askpass based on
argument, WGET_ASKPASS, and SSH_ASKPASS environment variables.
opt.wget-askpass is freed in cleanup ()
* src/main.c: Update options & add spawn process of opt.use_askpass
command.
* src/options.h: Addition of string use_askpass.
* src/url.c: Function scheme_leading_string to access the leading
string of a parsed url.
* src/url.h: Prototype for scheme_leading_string for returning the
leading string.
* bootstrap.conf: Add posix_spawn to gnulib_modules

This adds the --use-askpass option which is disabled by default.

--use-askpass=COMMAND will request the username and password for a given
URL by executing the external program COMMAND.  If COMMAND is left
blank, then the external program in the environment variable
WGET_ASKPASS will be used.  If WGET_ASKPASS is not set then the
environment variable SSH_ASKPASS is used.  If there is no value set, an
error is returned.  If an error occurs requesting the username or
password, wget will exit.

Signed-off-by: Liam R. Howlett <Liam.Howlett@WindRiver.com>
2016-09-03 21:01:24 +02:00
Dale R. Worley
796e30dcea Add tests for recursion and redirection.
* testenv/Test-recursive-basic.py: New file. Test basic recursion
    * testenv/Test-recursive-include.py: New File. Recursion test with
    include directories
    * testenv/Test-redirect.py: New File. Basic redirection tests
    * testenv/Makefile.am: Add new tests to makefile
2016-09-02 17:46:36 +02:00
Dale R. Worley
b919f988f2 Sort test names into order.
* testenv/Makefile.am: Sort all the python tests in alphabetical
    order
2016-09-02 17:45:19 +02:00
Dale R. Worley
ca1ee7d32f Corrections and amplifications to test documentation
* testenv/README: Update documentation to meet current project
    status
    * testenv/Test-Proto.py: Same
2016-09-02 17:44:10 +02:00
Giuseppe Scrivano
690c47e3b1 Append .tmp to temporary files
* src/http.c (struct http_stat): Add `temporary` flag.
(check_file_output): Append .tmp to temporary files.
(open_output_stream): Refactor condition to use hs->temporary instead.

Reported-by: "Misra, Deapesh" <dmisra@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
2016-08-24 12:29:01 +02:00
Tim Rühsen
9ffb64ba6a Limit file mode to u=rw on temp. downloaded files
* bootstrap.conf: Add gnulib modules fopen, open.
* src/http.c (open_output_stream): Limit file mode to u=rw
on temporary downloaded files.

Reported-by: "Misra, Deapesh" <dmisra@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
2016-08-24 12:28:55 +02:00
Giuseppe Scrivano
6698260f15 Fix some make syntax-check issues
cfg.mk: Skip .der files.
testenv/certs/server-template.cfg: Remove empty final line.
testenv/certs/ca-template.cfg: Likewise.
2016-08-21 15:35:36 +02:00
Tim Rühsen
0787d7253e * src/css-url.c (get_urls_css): Fix memory leak 2016-08-17 23:13:27 +02:00
Tim Rühsen
964f4646da * src/html-url.c (get_urls_html): Fix memory leak 2016-08-17 23:12:25 +02:00
Tim Rühsen
262baeb113 Improve PSL cookie checking
* configure.ac: Add --with-psl-file to set a PSL file
* src/cookies.c (check_domain_match): Load PSL_FILE with
  fallback to built-in data.

This change allows package maintainers to make Wget use the latest
PSL (DAFSA or plain text), without updating libpsl itself.

E.g. Debian now comes with a DAFSA binary within the 'publicsuffix'
package which allows very fast loading (no parsing or processing needed).
2016-08-17 16:32:26 +02:00
Tobias Stoeckmann
f4aeb41899 Fix stack overflow with way too many cookies
* src/cookies.c (cookie_header): Use heap instead of stack.
* src/http.c (request_send): Likewise.

If wget has to handle an insanely large amount of cookies (~700,000 on
32 bit systems or ~530,000 on 64 bit systems), the stack is not large
enough to hold these pointers, leading to undefined behaviour according
to POSIX; expect a segmentation fault in real life. ;)

Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
2016-08-10 19:59:25 +02:00
Tobias Stoeckmann
a9d49e5b15 Fix signal race condition
The signal handler for SIGALRM calls longjmp, but the handler is
installed before the jump target has been initialized. If another
process sends SIGALRM right between handler installation and target
initialization, the jump leads to undefined behavior.

This can easily be fixed by moving the signal handler installation
into the "SETJMP == 0" conditional block, which means that the target
has just been initialized.

* src/utils.c: call signal after SETJMP.

Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
2016-08-09 17:38:29 +02:00
Jeffery To
0fe79eeacb Remove hyphens from command names
* src/init.c: Remove hyphens from command names
* src/main.c: Likewise

Options with hyphens (or underscores) in their command name cannot be
set in a wgetrc file.

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
2016-08-05 09:45:09 +02:00
Tim Rühsen
e3fb4c3859 * src/metalink.c (badhash_suffix): Fix quoting 2016-08-04 13:09:28 +02:00
Matthew White
943a6d585f Add new option --keep-badhash to keep Metalink's files with a bad hash
* src/init.c: Add keepbadhash
* src/main.c: Add keep-badhash
* src/options.h: Add keep_badhash
* doc/wget.texi: Add docs for --keep-badhash
* src/metalink.h: Add prototypes badhash_suffix(), badhash_or_remove()
* src/metalink.c: New functions badhash_suffix(), badhash_or_remove().
  (retrieve_from_metalink): Call badhash_or_remove() on download error

With --keep-badhash, append .badhash to Metalink's files with checksum
mismatch. (retrieve_from_metalink): unique_create() may append another
suffix to avoid overwriting existing files.

Without --keep-badhash, remove downloaded files with checksum mismatch
(this conforms to the old behaviour).
2016-08-04 12:03:49 +02:00
Tim Rühsen
7fad76db4c * src/metalink.c: Remove C++ style comments 2016-08-03 13:48:07 +02:00
Matthew White
3e7e29f358 Add gnulib modules 'link', 'unlink' and 'symlink'
* bootstrap.conf: Add 'link', 'unlink' and 'symlink'
2016-08-03 13:44:34 +02:00
Matthew White
e0b60fd073 New: --continue continues partially downloaded Metalink's files
* src/metalink.c (retrieve_from_metalink): Continue file download if
  opt.always_rest is true

Without --continue, download as a new file with an unique name (this
conforms to the old behaviour).
2016-08-03 13:37:27 +02:00
Matthew White
9db02a0c46 Add support for Metalink's md2, and md4 hashes
* bootstrap.conf: Add crypto/md2, and crypto/md4
* src/metalink.c (retrieve_from_metalink): Add md2, and md4 support

This patch adds support for the deprecated (insecure) md2, and md4
Message-Digest algorithms to the Metalink module.
2016-08-03 12:58:43 +02:00
Matthew White
edad3c1df3 Add support for Metalink's md5, sha1, sha224, sha384, and sha512 hashes
* bootstrap.conf: Add crypto/sha512
* src/metalink.c (retrieve_from_metalink): Add md5, sha1, sha224,
  sha384, and sha512 support

Metalink's checksum verification was limited to sha256. This patch
adds support for md5, sha1, sha224, sha384, and sha512.
2016-08-03 12:49:26 +02:00
Sean Burford
20cac2c5ab Style fixes and DEBUG on setxattr failure.
* src/ftp.c: Fix style.
* src/http.c: Likewise.
* src/xattr.h: Likewise.
* src/xattr.c: Likewise,
  (write_xattr_metadata): Print debug msg on error.
2016-07-27 17:05:57 +02:00
Sean Burford
a933bdd31e Keep fetched URLs in POSIX extended attributes
* configure.ac: Check for xattr availability
* src/Makefile.am: Add xattr.c
* src/ftp.c: Include xattr.h.
  (getftp): Set attributes if enabled.
* src/http.c: Include xattr.h.
  (gethttp): Add parameter 'original_url',
  set attributes if enabled.
  (http_loop): Add 'original_url' to call of gethttp().
* src/init.c: Add new option --xattr.
* src/main.c: Add new option --xattr, add description to help text.
* src/options.h: Add new config member 'enable_xattr'.
* src/xatrr.c: New file.
* src/xattr.h: New file.

These attributes provide a lightweight method of later determining
where a file was downloaded from.

This patch changes:
*   autoconf detects whether extended attributes are available and
    enables the code if they are.
*   The new flags --xattr and --no-xattr control whether xattr is enabled.
*   The new command "xattr = (on|off)" can be used in ~/.wgetrc or /etc/wgetrc
*   The original and redirected URLs are recorded as shown below.
*   This works for both single fetches and recursive mode.

The attributes that are set are:
user.xdg.origin.url: The URL that the content was fetched from.
user.xdg.referrer.url: The URL that was originally requested.

Here is an example, where http://archive.org redirects to https://archive.org:
$ wget --xattr http://archive.org
...
$ getfattr -d index.html
user.xdg.origin.url="https://archive.org/"
user.xdg.referrer.url="http://archive.org/"

These attributes were chosen based on those stored by Google Chrome
https://bugs.chromium.org/p/chromium/issues/detail?id=45903
and curl https://github.com/curl/curl/blob/master/src/tool_xattr.c
2016-07-22 13:42:23 +02:00
Noël Köthe
ef372a4f27 Fix typos
* ChangeLog-2014-12-10: invokation -> invocation
* doc/wget.texi: invokation -> invocation
* src/main.c: seperated -> separated
* src/options.h: seperated -> separated
* testenv/README: invokation -> invocation
* testenv/conf/wget_commands.py: invokation -> invocation
2016-07-02 19:01:24 +02:00
Tim Rühsen
0b151f51eb Fix creating docs when make uses 'sh -e'
* doc/Makefile.am: Save fallback for pod2man --utf8

Reported-by: Jérémie Courrèges-Anglas <jca@wxcvbn.org>
2016-06-30 15:19:26 +02:00
Tim Rühsen
309e72c74f Fix compilation for OpenSSL 1.1.0
* src/openssl.c (ssl_init): Use SSL_is_init_finished() instead of
  SSL_state(), conditionally skip SSLeay function calls

The python test suite makes SSL_peek() hang, consuming 100% CPU time.
This does not happen on real world TLS connections, though, but needs
investigations.
2016-06-30 13:24:33 +02:00
Tim Rühsen
2318c309d4 Add script to generate test certs non-interactive
* Test-pinnedpubkey-hash-https.py: Read hashed pubkey from file
* Test-pinnedpubkey-hash-no-check-fail-https.py: Use invalid hash
* certs/make_ca.sh: New script to generate test certs non-interactive
* certs/ca-template.cfg: New file (template for CA cert)
* certs/server-template.cfg: New file (template for server cert)
* certs/server-pubkey-sha256.base64: New file (pubkey sha256 hash)
2016-06-29 12:54:06 +02:00
Ander Juaristi
cdc3e28d8e Bypass world-writable checks on Windows
* src/hsts.c (hsts_file_access_valid): we should check for "world-writable"
   files only on Unix-based systems. It's difficult to mimic the same behavior
   on Windows, so it's better to just not do it.

Reported-by: Gisle Vanem <gvanem@yahoo.no>
Reported-by: Eli Zaretskii <eliz@gnu.org>
2016-06-27 09:54:32 +02:00
Tim Rühsen
43359f47c4 Update gnulib and bootstrap
* gnulib: Sync gnulib submodule with upstream
* bootstrap: Update to latest version from gnulib/build-aux/
2016-06-14 09:27:58 +02:00