Commit Graph

2699 Commits

Author SHA1 Message Date
Tim Rühsen
c125d24762 Don't use extended attributes (--xattr) by default
* src/init.c (defaults): Set enable_xattr to false by default
* src/main.c (print_help): Reverse option logic of --xattr
* doc/wget.texi: Add description for --xattr

Users may not be aware that the origin URL and Referer are saved
including credentials, and possibly access tokens within
the urls.
2018-12-26 14:06:38 +01:00
Jay Satiro
61271d87f6 * src/init.c: Stop freeing the pointer returned by ws_mypath()
.. since ws_mypath() saves the address it returns in a static pointer
for reuse, to also be returned in later calls.
2018-11-13 15:51:51 +01:00
Darshit Shah
2bc2d2f803 * src/ftp.c(ftp_retrieve_glob): Honor {accept,reject}-regex switches as well 2018-11-13 15:51:51 +01:00
Darshit Shah
8c741da256 * src/ftp.c (ftp_retrieve_glob): Refactor to prevent looping over listing multiple times 2018-11-13 15:51:51 +01:00
Tim Rühsen
11fad3fa72 Revert "Bail out on unexpected 416 server errors"
This reverts commit 6f3b995993.

The code is obviously wrong, see https://savannah.gnu.org/bugs/?54963
Also, the example from the original post doesn't work any more.
With other words, the broken server behavior has been fixed meanwhile.
2018-11-09 16:16:43 +01:00
Rosen Penev
a3643c6076 openssl: Do not use engines when OpenSSL does not support
* src/openssl.c: Check for OPENSSL_NO_ENGINE before
 including openssl/engine.h and before calling ENGINE_load_builtin_engines()

Fixes compilation with no engines compiled.

Copyright-paperwork-exempt: Yes
Signed-off-by: Rosen Penev <rosenp@gmail.com>
2018-11-09 16:01:51 +01:00
Kapus, Timotej
6d7cd9313c Replace some loops with string.h functions
* src/init.c: Replace loop with strspn
* src/url.c: Replace loop with strrchr

Copyright-paperwork-exempt: Yes
2018-10-28 10:36:46 +01:00
Luiz Angelo Daros de Luca
fd85ac9cc6 * src/host.c (sufmatch): Fix dot-prefixed domain matching
Current sufmatch does not match when domain is dot-prefixed.
The example of no_proxy in man (.mit.edu) does use a dot-prefixed
domain.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Copyright-paperwork-exempt: Yes
2018-10-26 22:54:26 +02:00
Tim Rühsen
21daa24e72 * src/convert.c (convert_links): Fix fallthrough 2018-10-26 22:52:41 +02:00
Nikos Mavrogiannopoulos
c11cc83d9e Enable post-handshake auth under gnutls on TLS1.3 2018-10-08 15:55:48 +02:00
Tim Rühsen
0727b8f3a9 * src/http.c (resp_new): Fix code to avoid false positive by clang 2018-09-20 14:59:06 +02:00
Tim Rühsen
88a49c1e41 * src/convert.c (convert_links): Fix code to avoid false positive by clang 2018-09-20 14:58:27 +02:00
Tim Rühsen
02afe1e41c Add support for PCRE2 pattern matching
* configure.ac: Check for libpcre2-8
* src/init.c (choices): Test for HAVE_LIBPCRE2
* src/main.c (main): Set regex compile and match functions
* src/options.h: Test for HAVE_LIBPCRE2
* src/utils.c: Include pcre2.h, add functions
  compile_pcre2_regex() and match_pcre2_regex()
* src/utils.h: Declare compile_pcre2_regex() and match_pcre2_regex()

Fixes #54677
Reported-by: Noël Köthe
2018-09-19 16:22:25 +02:00
Tomas Hozza
2bbdfd76da Add TLS 1.3 support for GnuTLS
* doc/wget.texi: Add "TLSv1_3" to --secure-protocol
* src/gnutls.c (set_prio_default): Use GNUTLS_TLS1_3 where needed

Wget currently allows specifying "TLSv1_3" as the parameter for
--secure-protocol option. However it is only implemented for OpenSSL
and in case wget is compiled with GnuTLS, it causes wget to abort with:
GnuTLS: unimplemented 'secure-protocol' option value 6

GnuTLS contains TLS 1.3 implementation since version 3.6.3 [1]. However
currently it must be enabled explicitly in the application of it to be
used. This will change after the draft is finalized. [2] However for
the time being, I enabled it explicitly in case "TLSv1_3" is used with
--secure-protocol.

I also fixed man page to contain "TLSv1_3" in all listings of available
parameters for --secure-protocol

[1] https://lists.gnupg.org/pipermail/gnutls-devel/2018-July/008584.html
[2] https://nikmav.blogspot.com/2018/05/gnutls-and-tls-13.html

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-09-07 09:56:02 +02:00
Tomas Korbar
7ddcebd61e Avoid creating empty wget-log when using -O and -q in background
* src/log.c (check_redirect_output): Check for quiet mode
2018-08-29 12:34:03 +02:00
Tomas Hozza
2f451dbf4e * src/warc.c (warc_write_cdx_record): Fix RESOURCE LEAK found by Coverity
Error: RESOURCE_LEAK (CWE-772): - REAL ERROR
wget-1.19.5/src/warc.c:1376: alloc_fn: Storage is returned from allocation function "url_escape".
wget-1.19.5/src/url.c:284:3: alloc_fn: Storage is returned from allocation function "url_escape_1".
wget-1.19.5/src/url.c:255:3: alloc_fn: Storage is returned from allocation function "xmalloc".
wget-1.19.5/lib/xmalloc.c:41:11: alloc_fn: Storage is returned from allocation function "malloc".
wget-1.19.5/lib/xmalloc.c:41:11: var_assign: Assigning: "p" = "malloc(n)".
wget-1.19.5/lib/xmalloc.c:44:3: return_alloc: Returning allocated memory "p".
wget-1.19.5/src/url.c:255:3: var_assign: Assigning: "newstr" = "xmalloc(newlen + 1)".
wget-1.19.5/src/url.c:258:3: var_assign: Assigning: "p2" = "newstr".
wget-1.19.5/src/url.c:275:3: return_alloc: Returning allocated memory "newstr".
wget-1.19.5/src/url.c:284:3: return_alloc_fn: Directly returning storage allocated by "url_escape_1".
wget-1.19.5/src/warc.c:1376: var_assign: Assigning: "redirect_location" = storage returned from "url_escape(redirect_location)".
wget-1.19.5/src/warc.c:1381: noescape: Resource "redirect_location" is not freed or pointed-to in "fprintf".
wget-1.19.5/src/warc.c:1387: leaked_storage: Returning without freeing "redirect_location" leaks the storage that it points to.
\# 1385|     fflush (warc_current_cdx_file);
\# 1386|
\# 1387|->   return true;
\# 1388|   }
\# 1389|

url_escape() really returns a newly allocated memory and it leaks when the warc_write_cdx_record() returns. The memory returned from url_escape() is usually stored in a temporary variable in other parts of the project and then freed. I took the same approach.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:25:34 +02:00
Tomas Hozza
8b451f9f21 * src/warc.c (warc_write_start_record): Fix potential RESOURCE LEAK
In warc_write_start_record() function, the reutrn value of dup() is
directly used in gzdopen() call and not stored anywhere. However the
zlib documentation says that "The duplicated descriptor should be saved
to avoid a leak, since gzdopen does not close fd if it fails." [1].
This change stores the FD in a variable and closes it in case gzopen()
fails.

[1] https://www.zlib.net/manual.html

Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/warc.c:217: open_fn: Returning handle opened by "dup".
wget-1.19.5/src/warc.c:217: leaked_handle: Failing to save or close handle opened by "dup(fileno(warc_current_file))" leaks it.
\#  215|
\#  216|         /* Start a new GZIP stream. */
\#  217|->       warc_current_gzfile = gzdopen (dup (fileno (warc_current_file)), "wb9");
\#  218|         warc_current_gzfile_uncompressed_size = 0;
\#  219|

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:25:07 +02:00
Tomas Hozza
c045cdded4 * src/utils.c (open_stat): Fix RESOURCE LEAK found by Coverity
Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/utils.c:914: open_fn: Returning handle opened by "open". [Note: The source code implementation of the function has been overridden by a user model.]
wget-1.19.5/src/utils.c:914: var_assign: Assigning: "fd" = handle returned from "open(fname, flags, mode)".
wget-1.19.5/src/utils.c:921: noescape: Resource "fd" is not freed or pointed-to in "fstat". [Note: The source code implementation of the function has been overridden by a builtin model.]
wget-1.19.5/src/utils.c:924: leaked_handle: Handle variable "fd" going out of scope leaks the handle.
\#  922|     {
\#  923|       logprintf (LOG_NOTQUIET, _("Failed to stat file %s, error: %s\n"), fname, strerror(errno));
\#  924|->     return -1;
\#  925|     }
\#  926|   #if !(defined(WINDOWS) || defined(__VMS))

This seems to be a real issue, since the opened file descriptor in "fd"
would leak. There is also additional check below the "fstat" call, which
closes the opened "fd".

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:24:46 +02:00
Tomas Hozza
dfef92bac3 * src/http.c (http_loop): Fix RESOURCE LEAK found by Coverity
Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/http.c:4486: alloc_fn: Storage is returned from allocation function "url_string".
wget-1.19.5/src/url.c:2248:3: alloc_fn: Storage is returned from allocation function "xmalloc".
wget-1.19.5/lib/xmalloc.c:41:11: alloc_fn: Storage is returned from allocation function "malloc".
wget-1.19.5/lib/xmalloc.c:41:11: var_assign: Assigning: "p" = "malloc(n)".
wget-1.19.5/lib/xmalloc.c:44:3: return_alloc: Returning allocated memory "p".
wget-1.19.5/src/url.c:2248:3: var_assign: Assigning: "result" = "xmalloc(size)".
wget-1.19.5/src/url.c:2248:3: var_assign: Assigning: "p" = "result".
wget-1.19.5/src/url.c:2250:3: noescape: Resource "p" is not freed or pointed-to in function "memcpy". [Note: The source code implementation of the function has been overridden by a builtin model.]
wget-1.19.5/src/url.c:2253:7: noescape: Resource "p" is not freed or pointed-to in function "memcpy". [Note: The source code implementation of the function has been overridden by a builtin model.]
wget-1.19.5/src/url.c:2257:11: noescape: Resource "p" is not freed or pointed-to in function "memcpy". [Note: The source code implementation of the function has been overridden by a builtin model.]
wget-1.19.5/src/url.c:2264:3: noescape: Resource "p" is not freed or pointed-to in function "memcpy". [Note: The source code implementation of the function has been overridden by a builtin model.]
wget-1.19.5/src/url.c:2270:7: identity_transfer: Passing "p" as argument 1 to function "number_to_string", which returns an offset off that argument.
wget-1.19.5/src/utils.c:1776:11: var_assign_parm: Assigning: "p" = "buffer".
wget-1.19.5/src/utils.c:1847:3: return_var: Returning "p", which is a copy of a parameter.
wget-1.19.5/src/url.c:2270:7: noescape: Resource "p" is not freed or pointed-to in function "number_to_string".
wget-1.19.5/src/utils.c:1774:25: noescape: "number_to_string(char *, wgint)" does not free or save its parameter "buffer".
wget-1.19.5/src/url.c:2270:7: var_assign: Assigning: "p" = "number_to_string(p, url->port)".
wget-1.19.5/src/url.c:2273:3: noescape: Resource "p" is not freed or pointed-to in function "full_path_write".
wget-1.19.5/src/url.c:1078:47: noescape: "full_path_write(struct url const *, char *)" does not free or save its parameter "where".
wget-1.19.5/src/url.c:2287:3: return_alloc: Returning allocated memory "result".
wget-1.19.5/src/http.c:4486: var_assign: Assigning: "hurl" = storage returned from "url_string(u, URL_AUTH_HIDE_PASSWD)".
wget-1.19.5/src/http.c:4487: noescape: Resource "hurl" is not freed or pointed-to in "logprintf".
wget-1.19.5/src/http.c:4513: leaked_storage: Variable "hurl" going out of scope leaks the storage it points to.
\# 4511|               {
\# 4512|                 printwhat (count, opt.ntry);
\# 4513|->               continue;
\# 4514|               }
\# 4515|             else

There are two conditional branches, which call continue, without freeing memory potentially allocated and pointed to by"hurl" pointer. In fase "!opt.verbose" is True and some of the appropriate conditions in the following if/else if construction, in which "continue" is called, are also true, then the memory allocated to "hurl" will leak.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:24:24 +02:00
Tomas Hozza
b8be904ac7 * src/http.c (check_auth): Fix RESOURCE LEAK found by Coverity
Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/http.c:2434: alloc_fn: Storage is returned from allocation function "xmalloc".
wget-1.19.5/lib/xmalloc.c:41:11: alloc_fn: Storage is returned from allocation function "malloc".
wget-1.19.5/lib/xmalloc.c:41:11: var_assign: Assigning: "p" = "malloc(n)".
wget-1.19.5/lib/xmalloc.c:44:3: return_alloc: Returning allocated memory "p".
wget-1.19.5/src/http.c:2434: var_assign: Assigning: "auth_stat" = storage returned from "xmalloc(4UL)".
wget-1.19.5/src/http.c:2446: noescape: Resource "auth_stat" is not freed or pointed-to in "create_authorization_line".
wget-1.19.5/src/http.c:5203:70: noescape: "create_authorization_line(char const *, char const *, char const *, char const *, char const *, _Bool *, uerr_t *)" does not free or save its parameter "auth_err".
wget-1.19.5/src/http.c:2476: leaked_storage: Variable "auth_stat" going out of scope leaks the storage it points to.
\# 2474|                 /* Creating the Authorization header went wrong */
\# 2475|               }
\# 2476|->         }
\# 2477|         else
\# 2478|           {

Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/http.c:2431: alloc_fn: Storage is returned from allocation function "url_full_path".
wget-1.19.5/src/url.c:1105:19: alloc_fn: Storage is returned from allocation function "xmalloc".
wget-1.19.5/lib/xmalloc.c:41:11: alloc_fn: Storage is returned from allocation function "malloc".
wget-1.19.5/lib/xmalloc.c:41:11: var_assign: Assigning: "p" = "malloc(n)".
wget-1.19.5/lib/xmalloc.c:44:3: return_alloc: Returning allocated memory "p".
wget-1.19.5/src/url.c:1105:19: var_assign: Assigning: "full_path" = "xmalloc(length + 1)".
wget-1.19.5/src/url.c:1107:3: noescape: Resource "full_path" is not freed or pointed-to in function "full_path_write".
wget-1.19.5/src/url.c:1078:47: noescape: "full_path_write(struct url const *, char *)" does not free or save its parameter "where".
wget-1.19.5/src/url.c:1110:3: return_alloc: Returning allocated memory "full_path".
wget-1.19.5/src/http.c:2431: var_assign: Assigning: "pth" = storage returned from "url_full_path(u)".
wget-1.19.5/src/http.c:2446: noescape: Resource "pth" is not freed or pointed-to in "create_authorization_line".
wget-1.19.5/src/http.c:5203:40: noescape: "create_authorization_line(char const *, char const *, char const *, char const *, char const *, _Bool *, uerr_t *)" does not free or save its parameter "path".
wget-1.19.5/src/http.c:2476: leaked_storage: Variable "pth" going out of scope leaks the storage it points to.
\# 2474|                 /* Creating the Authorization header went wrong */
\# 2475|               }
\# 2476|->         }
\# 2477|         else
\# 2478|           {

Both "pth" and "auth_stat" are allocated in "check_auth()" function. These are used for creating the HTTP Authorization Request header via "create_authorization_line()" function. In case the creation went OK (auth_err == RETROK), then the memory previously allocated to "pth" and "auth_stat" is freed. However if the creation failed, then the memory is never freed and it leaks.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:23:52 +02:00
Tomas Hozza
b24351183e * src/ftp.c (getftp): Fix RESOURCE LEAK found by Coverity
Error: RESOURCE_LEAK (CWE-772):
wget-1.19.5/src/ftp.c:1493: alloc_fn: Storage is returned from allocation function "fopen".
wget-1.19.5/src/ftp.c:1493: var_assign: Assigning: "fp" = storage returned from "fopen(con->target, "wb")".
wget-1.19.5/src/ftp.c:1811: leaked_storage: Variable "fp" going out of scope leaks the storage it points to.
\# 1809|     if (fp && !output_stream)
\# 1810|       fclose (fp);
\# 1811|->   return err;
\# 1812|   }
\# 1813|

It can happen, that "if (!output_stream || con->cmd & DO_LIST)" on line #1398 can be true, even though "output_stream != NULL". In this case a new file is opened to "fp". Later it may happen in the FTPS branch, that some error will occure and code will jump to label "exit_error". In "exit_error", the "fp" is closed only if "output_stream == NULL". However this may not be true as described earlier and "fp" leaks.

On line #1588, there is the following conditional free of "fp":

  /* Close the local file.  */
  if (!output_stream || con->cmd & DO_LIST)
    fclose (fp);

Therefore the conditional at the end of the function after "exit_error" label should be modified to:

  if (fp && (!output_stream || con->cmd & DO_LIST))
    fclose (fp);

This will ensure that "fp" does not leak in any case it sould be opened.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2018-08-27 13:20:48 +02:00
Tim Rühsen
122a9f08a3 * src/gnutls.c (ssl_check_certificate): Fix grammar of error msg
Reported-by: Nicholas Sielicki
2018-06-13 20:34:24 +02:00
Tim Rühsen
4fc69950da * src/http.c (http_loop): Fix --retry-on-host-error 2018-06-13 20:16:22 +02:00
ethus3h
e7979da9e8 Add new option --retry-on-host-error
* doc/wget.texi: Add docs for --retry-on-host-error
* src/http.c (http_loop): Add code for HOSTERR
* src/init.c: Add option --retry-on-host-error
* src/main.c: Likewise
* src/options.h: Add options.retry_on_host_error

Copyright-paperwork-exempt: Yes
2018-06-13 20:10:28 +02:00
Tim Rühsen
ad261f41ce Save original data to WARC file
* src/retr.c (write_data): Cleanup,
  (fd_read_body): Write to WARC before uncompressing

Fixes: #53968
2018-05-29 10:52:20 +02:00
Tim Rühsen
4188fcdced * src/main.c (print_version): Silence UBSAN message 2018-05-09 13:56:20 +02:00
Tim Rühsen
4bdb09d3a7 * src/utils.ci (file_exists_p): Fix stat(NULL,...) 2018-05-09 12:37:03 +02:00
Tim Rühsen
35f5f79ce1 * src/hsts.c (open_hsts_test_store): Fix unlink(NULL) 2018-05-09 12:29:39 +02:00
Tim Rühsen
3cbdc67c96 * src/hash.c: Silence UBSAN for hash functions 2018-05-09 12:16:51 +02:00
Tim Rühsen
ace96e4412 * src/hsts.h: Fix header guard 2018-05-08 10:17:06 +02:00
Tim Rühsen
77286a2e03 * src/version.h: Add header guard 2018-05-08 10:10:44 +02:00
Tim Rühsen
7eff94e881 * src/host.c (wait_ares): Remove void assignment
Reported-by: Josef Moellers
2018-05-08 09:36:48 +02:00
Tim Rühsen
1fc9c95ec1 Fix cookie injection (CVE-2018-0494)
* src/http.c (resp_new): Replace \r\n by space in continuation lines

Fixes #53763
 "Malicious website can write arbitrary cookie entries to cookie jar"

HTTP header parsing left the \r\n from continuation line intact.
The Set-Cookie code didn't check and could be tricked to write
\r\n into the cookie jar, allowing a server to generate cookies at will.
2018-05-06 18:24:58 +02:00
Tim Rühsen
77cf701416 * src/init.c: Bring new --ciphers into right order in options array 2018-05-06 12:49:46 +02:00
Ander Juarist
b9c4cadd84 OpenSSL: Better seeding of PRNG
* src/openssl.c (init_prng): keep gathering entropy even though we
                              already have enough
   (ssl_connect_with_timeout_callback): reseed PRNG again just before
                                        the handshake

Reported-by: Jeffrey Walton <noloader@gmail.com>
2018-05-05 22:49:06 +02:00
Ander Juaristi
744671aac6 Enhance SSL/TLS security
This commit hardens SSL/TLS a bit more in the following ways:

 * Explicitly exclude NULL authentication and the 'MEDIUM' cipher list
   category. Ciphers in the 'HIGH' level are only considered - this
   includes all symmetric ciphers with key lengths larger than 128 bits,
   and some ('modern') 128-bit ciphers, such as AES in GCM mode.
 * Allow RSA key exchange by default, but exclude it when
   Perfect Forward Secrecy is desired (with --secure-protocol=PFS).
 * Introduce new option --ciphers to set the cipher list that the SSL/TLS
   engine will favor. This string is fed directly to the underlying TLS
   library (GnuTLS or OpenSSL) without further processing, and hence its
   format and syntax are directly dependent on the specific library.

Reported-by: Jeffrey Walton <noloader@gmail.com>
2018-05-05 22:49:06 +02:00
Tim Rühsen
26a50942d8 * src/netrc.c (parse_netrc_fp): Fix two memleaks 2018-04-28 20:50:30 +02:00
Tim Rühsen
a1c9018797 Add new fuzzer for the .netrc parser
* fuzz/wget_netrc_fuzzer.c: New fuzzer
* fuzz/wget_netrc_fuzzer.dict: Fuzzer dictionary
* fuzz/wget_netrc_fuzzer.in: Initial corpora
* src/ftp.c (getftp): Amend call to search_netrc()
* src/http.c (initialize_request): Likewise
* src/netrc.c: Cleanup, prepare code for fuzzing
* src/netrc.h: Cleanup
2018-04-28 20:49:57 +02:00
Tim Rühsen
734d0aee15 * src/utils.c (match_tail): Fix unsigned integer overflow 2018-04-27 12:56:25 +02:00
Tim Rühsen
78838d761f Fix buffer overflow in CSS parser
* src/css-url.c (get_uri_string): Check input length
* fuzz/wget_css_fuzzer.repro/buffer-overflow-6600180399865856:
  Add reproducer corpus

Fixes OSS-Fuzz issue #8033.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-26 22:40:28 +02:00
Tim Rühsen
cb47f3aaa4 Fix buffer overflow in CSS parser
* src/css-url.c (get_urls_css): Check input string length
* fuzz/wget_css_fuzzer.repro/negative-size-param-5724866467594240:
  Add reproducer corpus

Fixes OSS-Fuzz issue #8032.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-26 21:25:28 +02:00
Tim Rühsen
caa08d7470 Update CSS grammar from 1.x to 2.2
* src/css-tokens.h: Add enums and fixate values
* src/css.l: Include config.h,
  ignore several compiler warnings,
  update the grammar to CSS 2.2

Fixes OSS-Fuzz issue #8010 (slowness issue).
This is a long standing bug affecting all versions <= 1.19.4.

Some crafted CSS input was extremely slow / CPU wasting, so it could
be used as a DOS attack against website scanning.

The code/grammar changes were backported from Wget2.x.
2018-04-26 13:10:39 +02:00
Tim Rühsen
76fb1fe6f6 * src/res.c (add_path): Fix memleak (parsing robots.txt)
Fixes OSS-Fuzz issue #8005.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-25 11:33:38 +02:00
Tim Rühsen
fe6d1247ad * src/ftp-ls.c (ftp_parse_winnt_ls): Fix integer overflow
Fixes OSS-Fuzz issue #7999.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-25 09:37:29 +02:00
Tim Rühsen
7ee3ad1c48 * src/ftp-ls.c (ftp_parse_winnt_ls): Fix integer overflow 2018-04-24 11:11:47 +02:00
Tim Rühsen
79c1f333dc * src/ftp-ls.c (ftp_parse_vms_ls): Fix integer overflow by left shift 2018-04-24 11:05:52 +02:00
Tim Rühsen
d8365b0607 * src/ftp-ls.c (ftp_parse_unix_ls): Fix integer overflow in date parsing 2018-04-24 10:55:29 +02:00
Tim Rühsen
b0f802c46c * src/ftp-ls.c (ftp_parse_winnt_ls): Fix heap-buffer-overflow
Fixes OSS-Fuzz issue #7931.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-22 12:45:51 +02:00
Tim Rühsen
96c64a859d * src/ftp-ls.c (ftp_parse_winnt_ls): Fix heap-buffer-overflow
Fixes OSS-Fuzz issue #7930.
This is a long standing bug affecting all versions <= 1.19.4.
2018-04-22 11:33:35 +02:00
Tim Rühsen
7d3da08537 * src/ftp-ls.c (eat_carets): Fix heap-buffer-overflow 2018-04-21 23:48:01 +02:00
Tim Rühsen
2b61c46183 * src/ftp-ls.c (ftp_parse_winnt_ls): Fix memleak 2018-04-21 22:52:01 +02:00
Tim Rühsen
f0d715b264 * src/ftp-ls.c (ftp_parse_vms_ls): Fix heap-buffer-overflow 2018-04-21 22:47:17 +02:00
Tim Rühsen
b3ff8ce3d5 * src/ftp-ls.c (ftp_parse_vms_ls): Fix heap-buffer-overflow 2018-04-21 22:45:03 +02:00
Tim Rühsen
c7014fbaea * src/ftp-ls.c (ftp_parse_vms_ls): Fix memleak 2018-04-21 22:42:30 +02:00
Tim Rühsen
407cd5f23b Add new fuzzer for the FTP listing parsers
* fuzz/Makefile.am: Add wget_ftpls_fuzzer
* fuzz/wget_ftpls_fuzzer.c: New fuzzer
* fuzz/wget_ftpls_fuzzer.dict: Fuzzer dictionary
* fuzz/wget_ftpls_fuzzer.in/starter: Starting corpus
* src/ftp-ls.c: Parsing function take FILE * as argument,
  new function ftp_parse_ls_fp()
* src/ftp.c: Remove static from freefileinfo()
* src/ftp.h: Add ftp_parse_ls_fp() and freefileinfo()
2018-04-21 19:24:25 +02:00
Tim Rühsen
7ecfe3ef70 * src/main.c (main): Fix memleak for fuzzing/testing 2018-04-21 18:21:52 +02:00
Tim Rühsen
7e635d173e * src/init.c: Fix fuzzing in case ~/.wgetrc doesn't exist 2018-04-21 16:33:45 +02:00
Tim Rühsen
23b0275feb Add new HTML parser fuzzer
* fuzz/Makefile.am: Add wget_html_fuzzer
* fuzz/wget_html_fuzzer.c: New fuzzer
* fuzz/wget_html_fuzzer.dict: HTML dictionary for fuzzing
* fuzz/wget_html_fuzzer.in: Initial corpora
* src/html-url.c: Add new function get_urls_html_fm()
* src/html-url.h: Add ne function get_urls_html_fm()
* src/wget.h: Fix define for fopen_wgetrc()
2018-04-20 22:33:58 +02:00
Tim Rühsen
c9a091ae45 * src/css-url.c (get_uri_string): Fix buffer overflow (read) 2018-04-20 11:37:52 +02:00
Tim Rühsen
7a5db30b01 * src/iri.h: Fix C++ compile error 2018-04-20 10:17:55 +02:00
Tim Rühsen
9d899d7bb7 * src/http.c: Download and scan CSS files in spider mode 2018-04-19 23:05:06 +02:00
Tim Rühsen
d25d036fba * src/css-url.c (get_urls_css): Call yylex_destroy() to reset CSS scanner 2018-04-19 23:05:06 +02:00
Tim Rühsen
ff3c7733b7 * src/html-url.h: Include needed header files 2018-04-18 20:41:08 +02:00
Tim Rühsen
3ae58dae13 Fix oss-fuzz issue with exit()
* src/wget.h: Define exit() as exit_wget()
* fuzz/wget_options_fuzzer.c: Implement exit_wget() and cleanup
2018-04-18 13:26:10 +02:00
Tim Rühsen
66b416b6cd Fix fopen/stdin issues with fuzzing
* fuzz/wget_options_fuzzer.c: Add fopen_wget() and fopen_wgetrc()
* src/utils.c: Use fopen_wgetrc() for config files,
  don't read from stdin when fuzzing
* src/wget.h: Define fopen as fopen_wget when fuzzing,
  define fopen_wgetrc as fopen when not fuzzing
2018-04-17 23:02:04 +02:00
Tim Rühsen
3c4a6506a5 * src/log.c: Don't check_redirect_output() when fuzzing 2018-04-17 12:40:47 +02:00
Tim Rühsen
fbb4cd231e * src/main.c (promt_for_password): Avoid getpass() when fuzzing 2018-04-17 12:15:18 +02:00
Tim Rühsen
3ceb6e5630 Fix double fclose() with -d while fuzzing
* src/ftp.c (ftp_loop_internal): Set warc_tmp to NULL after ffclose()
* src/init.c (cleanup): Set output_stream to NULL after fclose()
* src/log.c (log_close): Set global stream vars to NULL after closing
* src/recur.c (retrieve_tree): Set rejectedlog to NULL after closing
* src/warc.c (warc_close): Set stream vars to NULL after closing
2018-04-17 11:59:54 +02:00
Tim Rühsen
eaf167aaaa * src/main.c (main): Don't background if TESTING 2018-04-17 11:50:36 +02:00
Tim Rühsen
7d5de64fc9 * src/init.c (initialize): Return error, don't exit() 2018-04-17 11:42:43 +02:00
Tim Rühsen
70042265be * src/init.c (cmd_use_askpass): Return false on error 2018-04-16 23:04:53 +02:00
Tim Rühsen
64758655c4 * src/utils.c (compile_posix_regex): Hard-code string to regcomp
regcomp() may be too cpu + memory intensive for fuzzing.
See https://sourceware.org/glibc/wiki/Security%20Exceptions
2018-04-16 22:04:54 +02:00
Tim Rühsen
e737c4b10e Fix 2 more memleaks
* src/init.c (initialize): Use global var for wgetrc filename
* src/iri.c (find_locale): Return strdup'ed locale string
* src/options.h (struct options): Add wgetrcfile
2018-04-16 22:02:11 +02:00
Tim Rühsen
05a8c064e9 * src/init.c (cleanup): Set output_stream to NULL after closing 2018-04-16 13:22:29 +02:00
Tim Rühsen
01002a168a Fix homedir memory leaks
* src/hsts.c: Use opt.homedir
* src/init.c: Likewise
* src/main.c: Likewise
* src/netrc.c: Likewise
* src/options.h (struct options): Add homedir
2018-04-16 13:19:03 +02:00
Tim Rühsen
73fd57585c * src/main.c (main): Free opt.encoding_remote properly 2018-04-16 12:21:52 +02:00
Tim Rühsen
7963260e76 * src/host.c (wait_ares): Free ptimer 2018-04-16 11:58:18 +02:00
Tim Rühsen
99a7039def * src/init.c (cleanup): Free regex objects properly 2018-04-16 11:57:39 +02:00
Tim Rühsen
d7e3acb2cc * src/init.c (cleanup): Never call cleanup() twice 2018-04-16 09:58:51 +02:00
Tim Rühsen
e0860dd1ff * src/init.c (cmd_bytes_sum): Fix integer over- and underflow 2018-04-16 09:58:51 +02:00
Tim Rühsen
15ef79f808 * src/main.c (save_hsts): Free hsts_store after closing 2018-04-16 09:58:51 +02:00
Tim Rühsen
79385a29fd Use strtol() instead of selfmade function
* src/init.c (cmd_number): Use strtol() instead of selfmade function
* bootstrap.conf: Add strtol gnulib module
2018-04-16 09:58:51 +02:00
Tim Rühsen
55da9f71f0 * src/hsts.c (hsts_hash_func): Allow integer overflow 2018-04-16 09:58:51 +02:00
Tim Rühsen
bec9816f40 * init.c (cmd_spec_mirror): Fix uninitialzed stack variable 2018-04-16 09:58:51 +02:00
Tim Rühsen
b86294e1c9 * src/init.c (cleanup): Free more variables 2018-04-16 09:58:51 +02:00
Tim Rühsen
328438e69b * src/utils.c (fopen_stat): Early return to allow fuzzing/fmemopen 2018-04-16 09:58:51 +02:00
Tim Rühsen
36f029d2f0 * src/init.c (initialize): Free mem before exit() 2018-04-16 09:58:51 +02:00
Tim Rühsen
a4402120ad Add OSS-Fuzz infrastruture
* Makefile.am: Add fuzz/ to SUBDIRS
* cfg.mk: Fix 'make syntax-check'
* configure.ac: Add --enable-fuzzing
* fuzz/Makefile.am: New file
* fuzz/README.md: New file
* fuzz/fuzzer.h: New file
* fuzz/get_all_corpora: New file
* fuzz/get_ossfuzz_corpora: New file
* fuzz/glob_crash.c: New file
* fuzz/main.c: New file
* fuzz/run-afl.sh: New file
* fuzz/run-clang.sh: New file
* fuzz/view-coverage.sh: New file
* fuzz/wget_options_fuzzer.c: New file
* fuzz/wget_options_fuzzer.dict: New file
* src/init.c (cleanup): Free more resources
* src/main.c (init_switches): Initialize only once,
  (print_usage): Don't print if TESTING is defined
* src/utils.h: Include wget.h
2018-04-16 09:58:51 +02:00
Tim Rühsen
de54c970b2 Move unit-test code to tests/
* src/Makefile.am: Remove test.c and test.h
* src/test.c: Rename to tests/unit-tests.c
* src/test.h: Rename to tests/unit-tests.h
* tests/Makefile.am: Add unit-tests.c and unit-tests.h
* src/hsts.c: Amend #include
* src/http.c: Likewise
* src/init.c: Likewise
* src/metalink.c: Likewise
* src/res.c: Likewise
* src/url.c: Likewise
* src/utils.c: Likewise
2018-04-05 15:06:47 +02:00
Tim Rühsen
3e84963e84 * src/main.c: Rename main() -> main_wget() for unit tests 2018-04-05 15:06:47 +02:00
Tim Rühsen
f56f970bc2 Fix some issues found by 'infer' 2018-03-14 14:43:35 +01:00
Tim Rühsen
0b54043d17 * src/openssl.c: Fix build for OpenSSL 1.1.0 without TLS1_3_VERSION 2018-03-08 16:17:14 +01:00
Loganaden Velvindron
fde8cefd13 Add TLS1.3 support for OpenSSL build
* src/init.c: Add 'tlsv1_3 for --secure-protocol
* src/openssl.c (ssl_init): Enable TLS1.3 if possible
* src/options.h: Add secure_protocol_tlsv1_3
* doc/wget.texi: Add description of TLSv1_3

Copyright-paperwork-exempt: Yes
2018-03-08 15:30:14 +01:00
Tim Rühsen
ba2b0654b4 * src/main.c: Add help text for --retry-on-http-error
Reported-by: Giovanni Tirloni
2018-03-07 10:32:08 +01:00
Tim Rühsen
375bfa98dc * src/url.c (convert_fname): Fix invalid free on iconv_open() failure
Reported-by: Volkmar Klatt
2018-03-01 16:03:29 +01:00
Tim Rühsen
bea54e0da4 * src/mswindows.c: Fix prototype of fork_to_background()
Reported-by: Gisle Vanem
2018-02-21 19:05:15 +01:00
Tim Rühsen
9887b870d1 Use gnulib's utime()
* bootstrap.conf: Add modules utime and utime-h
* src/utils.c (touch): Remove own code for gnulib's utime()
2018-02-09 10:21:43 +01:00
Tim Rühsen
c722973212 Fix logging in background mode
* ../src/main.c: Re-init logfile if changed for background mode
* ../src/utils.c: fork_to_background() returns whether logfile changed
* ../src/utils.h: Set return type bool for fork_to_background()

Fixes: #53020
Reported-by: Noël Köthe
2018-02-09 10:21:43 +01:00
Tim Rühsen
bb7fa977a1 * src/http.c: Fix two typos in comments 2018-02-09 10:18:35 +01:00
Tim Rühsen
d27032c446 Mention list and bugtracker for --help and in man page
* doc/wget.texi: Mention list and bugtracker in man page
* src/main.c: Mention list and bugtracker for --help
2018-01-22 10:39:49 +01:00
Darshit Shah
d0a5d9f131 Switch off compression by default
Gzip compression has a number of bugs which need to be ironed out before
we can support it by default. Some of these stem from a misunderstanding
of the HTTP spec, but a lot of them are also due to many web servers not
being compliant with RFC 7231.

With this commit, I am marking GZip compression support as experimental
in GNU Wget pending further investigation and the addition of tests.

* src/init.c (defaults): Switch of compression support by default
* docs/wget.texi: State that compression is experimental
2018-01-21 10:51:11 +01:00
Darshit Shah
0d0a95a01b Revert "* src/init.c (defaults): Set compression_none as the default compression"
This reverts commit 8283ac0846.
2018-01-21 10:50:44 +01:00
Darshit Shah
8283ac0846 * src/init.c (defaults): Set compression_none as the default compression 2018-01-21 10:17:39 +01:00
Reiji
a7cc4e2b37 * src/http.c (gethttp): Fix bug that prevented all files from being decompressed
Signed-off-by: Darshit Shah <darnir@gnu.org>
2018-01-20 14:04:28 +01:00
Tim Rühsen
55d25fc20c * src/host.c (sufmatch): Fix to domain matching 2018-01-19 19:32:01 +01:00
Gisle Vanem
513cc1c0c8 * src/netrc.c: Fix Standalone compilation of netrc file 2018-01-17 14:44:52 +01:00
Darshit Shah
183fccdaad Update Copyright years 2018-01-14 11:24:43 +01:00
Darshit Shah
d26c6c0028 * src/netrc.c: Search for the correct netrc file on Windows 2018-01-14 10:55:03 +01:00
Tim Rühsen
047746eb76 * src/http.c: Exclude *.gz and *.tgz from decompression 2018-01-10 15:46:13 +01:00
Tim Rühsen
d8df356d4b * src/utils.c (wg_pin_peer_pubkey): Fix format warning 2017-12-31 13:03:25 +01:00
Peter Wu
220c24ecb5 Avoid redirecting output to file when tcgetpgrp fails
* src/log.c (check_redirect_output): tcgetpgrp can return -1 (ENOTTY),
be sure to check whether a valid controlling terminal exists before
redirecting.

Fixes: #51181
2017-12-31 12:59:15 +01:00
Darshit Shah
693cee0109 Don't assume a 416 response has no body
* http.c(gethttp): In case of a 416 response, try to drain the socket of
any bytes before reusing the connection

Reported-By: Iru Cai <mytbk920423@gmail.com>
2017-12-08 18:44:17 +01:00
Tim Rühsen
6aa6b669ef Support building with OpenSSL 1.1 w/o deprecated features
* src/openssl.c (ssl_init): Fix code for the subject's issue

Reported-by: Matthew Thode
2017-11-26 18:59:47 +01:00
Tim Rühsen
8551ceccfe Avoid link conversion after 304 Not Modified
* src/http.c (gethttp): Handle 304 before setting document content type

Fixes: #52404
Reported-by: Ben Fuchs
2017-11-25 19:33:03 +01:00
YX Hao
19060db44f Fix printing mutibyte chars as unprintable chars on Windows
* src/log.c (get_warc_log_fp): Fix return value to stderr
* src/main.c (main): Init logging as soon as possible,
  fix locale/charset on Windows
2017-11-16 12:23:20 +01:00
YX Hao
a9a953feee Convert remote path to local encoding
* src/url.c (url_file_name): Convert remote path to local encoding
2017-11-15 19:58:53 +01:00
Tim Rühsen
267cd51fff Do not use must-revalidate in Cache-Control header
As the bug report states, 'must-revalidate' is a request directive.

Fixes #52379
2017-11-10 10:57:46 +01:00
Darshit Shah
973c26ed7d Fix Segfault due to derefencing null ptr
* src/http.c(gethttp): When Encoding is gzip, ensure that the
Content-Type Header was actually seen. Without this, the "type" variable
is null causing a Segfault.

Reported-By: Noël Köthe <noel@debian.org>
2017-11-06 10:09:03 +01:00
Tim Rühsen
16d066f89c * src/http.c: Fix H_REDIRECTED 2017-11-03 22:23:04 +01:00
Tim Rühsen
a2477d487c * src/http.c: Add support for HTTP status code 308 2017-11-03 22:12:11 +01:00
Tim Rühsen
ba6b44f674 Fix heap overflow in HTTP protocol handling (CVE-2017-13090)
* src/retr.c (fd_read_body): Stop processing on negative chunk size

Reported-by: Antti Levomäki, Christian Jalio, Joonas Pihlaja from Forcepoint
Reported-by: Juhani Eronen from Finnish National Cyber Security Centre
2017-10-26 17:29:38 +02:00
Tim Rühsen
d892291fb8 Fix stack overflow in HTTP protocol handling (CVE-2017-13089)
* src/http.c (skip_short_body): Return error on negative chunk size

Reported-by: Antti Levomäki, Christian Jalio, Joonas Pihlaja from Forcepoint
Reported-by: Juhani Eronen from Finnish National Cyber Security Centre
2017-10-26 17:29:38 +02:00
YX Hao
27d78d944f Avoid unnecessary UTF-8 encoded fallback (trivial change)
* src/retr.c (retrieve_url): Check for changed URL on redirect
2017-10-25 14:26:36 +02:00
Tim Rühsen
60f033426f Add GNU extensions to .netrc parsing
src/netrc.c (parse_netrc): Add 'port' and 'force' extensions

Reported-by: September 20Tim Landscheidt
2017-09-27 12:42:06 +02:00
Josef Moellers
6f3b995993 Bail out on unexpected 416 server errors
* src/http.c (gethttp): Stop on 416 if file is incomplete
2017-09-18 16:45:49 +02:00
Tim Schlueter
c451eec155 Add gzip Content-Encoding decompression
* src/http.c (struct http_stat): Add remote_encoding field.
(read_response_body): Enable gzip decompression.
(initialize_request): Send gzip Accept-Encoding header.
(gethttp): Decompress files with gzip Content-Encoding.
* src/retr.c: include zlib.h.
(zalloc): New function.
(zfree): New function.
(fd_read_body): Decompress gzip data.
* src/retr.h (fd_read_body enum): Add rb_compressed_gzip flag.
2017-08-04 14:34:53 +02:00
Tim Schlueter
b543dfe783 Add --compression option
* doc/wget.texi: Add --compression documentation.
* src/init.c (cmd_spec_compression): New function.
(commands[]): Add opt.compression.
(defaults): Set default opt.compression value.
* src/main.c (option_data[]): Add struct for --compression.
(print_help, help[]): Add description for --compression.
(main): Add incompatibility checks for --compression.
* src/options.h (struct options): Add compression enum and field.
2017-08-04 14:34:53 +02:00
Tim Schlueter
08ed2a5530 Adjust Extension based on Content-Encoding
* doc/wget.texi (--adjust-extension, adjust_extension): Updated documentation.
* src/http.c (encoding_t): New enum.
(struct http_stat): Add local_encoding field.
(gethttp): --adjust-extension based on Content-Encoding.
2017-08-04 14:34:53 +02:00
Tim Rühsen
3ad3b3e36c * src/url.c (url_scheme): Use ASCII version of strncasecmp 2017-07-28 17:11:26 +02:00
Tim Rühsen
5fb6b6bd68 Fix misuse of strncasecmp
* src/http.c (set_content_type): Use c_strcasecmp instead of strncasecmp

See issue bug #51576
2017-07-28 16:56:27 +02:00
Tim Rühsen
21154bdc36 Check for 304 response before applying --adjust-extension
* src/http.c (gethttp): Move 304 code before --adjust-extension code

This fixes applying --adjust-extension in combination with 304
HTTP responses. It could lead to .html extensions to arbitrary
files.

Reported-by: anfractuosity
2017-06-13 11:25:20 +02:00
Tim Rühsen
ae293c945a Fix buffer overflow in Public Key Pinning
* src/utils.c (wget_base64_decode): Add param for destination size,
  (wg_pubkey_pem_to_der): Amend call to wget_base64_decode(),
  (wg_pin_peer_pubkey): Likewise and fix code style.
* src/utils.h: Add param to wget_base64_decode()
* src/http-ntlm.c (ntlm_input): Amend call to wget_base64_decode()
* src/http.c (skip_content_type): Likewise

Fixes #51227
2017-06-13 10:23:04 +02:00
Tomas Hozza
876def8ebe Add command line option to disable use of .netrc
Although internally code uses option for (not) reading .netrc for
credentials, it was not possible to turn this behavior off on command
line. Note that it was possible to turn it off using wgetrc.

Idea for this change came from Bruce Jerrick (bmj001@gmail.com).
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1425097

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2017-05-15 16:06:50 +02:00
Tomas Hozza
f8c3df1f40 Fixed getting of credentials from .netrc
There seemed to be a copy&paste error in http.c code, which decides
whether to get credentials from .netrc. In ftp.c "user" and "pass"
variables are char*, while in http.c, these are char**. For this reason
they should be dereferenced when determining if password and user login
is set to some value.

Also since both variables are dereferenced on lines above the changed
code, it does not really make sense to check if they are NULL.

This patch is based on fix from Bruce Jerrick <bmj001@gmail.com>.
Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1425097

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2017-05-15 16:06:50 +02:00
Tim Rühsen
936efc3564 * src/iri.c (idn_encode): Better IDNA 2003 compatibility 2017-05-11 11:56:26 +02:00
Tomas Hozza
0b41c7543a Mention TLSv1_1 and TLSv1_2 as secure-protocol values in help
* src/main.c: The --secure-protocol option accepts also values TLSv1_1
and TLSv1_2, as mentioned in the man page. However the help message
doesn't mention these two values. This patch adds TLSv1_1 and TLSv1_2 as
possible values to the help message.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2017-05-04 14:51:54 +02:00
Tim Rühsen
c4a2b2e77e * src/http.c (gethttp): Support Wayback Machine's X-Archive-Orig-last-modified 2017-05-03 16:37:11 +02:00
Tim Rühsen
56c78c4b09 * src/utils.c: Remove non-portable __builtin_unreachable() 2017-04-18 13:22:25 +02:00
Tim Rühsen
0ec46cb109 Skip iconv() usage if HAVE_ICONV is not defined
This helps on broken iconv implementations, e.g. Solaris.

Reported-by: Mojca Miklavec
2017-04-18 13:17:19 +02:00
Tim Rühsen
92bfe2a2e4 Fix charset transcoding issue for non-reversible codepoints
* src/url.c: Check iconv() against 0, not -1

On some libiconv implementations, unknown codepoints become
encoded as ?, e.g. when converting a non-ascii codepoint to ASCII.
This results in ambigious file names which also fails our tests.
2017-04-16 19:55:14 +02:00
Tim Rühsen
fc2f4233ed * src/iri.c: Fix WIN32 idn2_free, forgotten code 2017-04-16 19:50:10 +02:00
Darshit Shah
b2c38d33e1 * src/init.c: Set flstats correctly when using WGETRC env var 2017-04-14 01:16:49 +02:00
Tim Rühsen
6ef493b19e Fix use of idn2_free()
* src/connect.c (connect_to_ip): Use xfree() instead of idn2_free()
* src/host.c (lookup_host): Use xfree() instead of idn2_free()
* src/iri.h: Do not include idn2.h
* src/url.c (url_free): Use xfree() instead of idn2_free()
* src/url.h (struct url): Remove 'idn_allocated' from struct

Reported-by: Gisle Vanem
2017-04-08 11:05:55 +02:00
klemens
f381831d88 Fix typos in comments 2017-04-01 19:38:09 +02:00
Tim Rühsen
02d40a4676 * src/metalink.c (retrieve_from_metalink): Fix len in memset() 2017-03-31 13:15:27 +02:00
Vijo Cherian
400b8eba6c Safeguards against TOCTTOU
* src/utils.h: Add struct file_stat_s declaration,
  change prototypes of file_exists_p(),
  add prototypes for fopen_stat() and open_stat().
* src/utils.c: Extend file_exists_p(),
  new function fopen_stat() and open_stat(),
  add new param for file_exists_p().
* src/init.h: Add param file_stats_t to run_wgetrc().
* src/ftp.c: Amend calls to extended functions.
* src/hsts.c: Likewise.
* src/http.c: Likewise.
* src/init.c: Likewise.
* src/main.c: Likewise.
* src/metalink.c: Likewise.
* src/retr.c: Likewise.
* src/url.c: Likewise.

Added fopen_stat() and open_stat() that checks to makes sure the file didn't
change underneath us.
Return error from file_exists_p().
Added a way to return error from this file without major surgery to the
callers.

Fixes: #20369
2017-03-24 09:39:09 +01:00
Christof Horschitz
1d71645c06 * src/warc.c (warc_write_cdx_record): Escape URLs 2017-03-22 15:01:04 +01:00
Mike Frysinger
e249844143 Include libunistring headers only when used
* src/iri.c: Check for libidn2 < 0.14 to include libunistring headers

The unistring functions are used only when an older version of libidn2
is used, so don't include its headers either w/newer libdin2 versions.
2017-03-20 09:39:20 +01:00
Tim Rühsen
84a93f4127 Fix links to www.robotstxt.org
* NEWS: Fix links
* doc/wget.texi: Likewise
* src/res.c: Likewise

Reported-by: Noël Köthe
2017-03-18 19:05:38 +01:00
Tim Rühsen
90b487369a Include <arpa/inet.h> for Windows
Reported-by: Gisle Vanem
2017-03-08 13:00:54 +01:00
Tim Rühsen
57d748117f Fix updating HSTS entries
* src/hsts.c (hsts_store_entry): Always update 'created' field

Fixes: #50490
Reported-by: Deian Stefan, Atyansh Jaiswal, Jonathan Luck
2017-03-08 10:56:12 +01:00
Tim Rühsen
4d729e322f Fix CRLF injection in Wget host part
* src/url.c (url_parse): Reject control characters in host part of URL

Reported-by: Orange Tsai
2017-03-06 10:04:22 +01:00
Benjamin Esham
63c2aea255 * src/warc.c: Use warc_write_header_uri for all WARC-Target-URI fields
The WARC spec requires that all URIs be enclosed in angle brackets. This
was being done in most cases, but not for "WARC-Target-URI" fields in
WARC blocks of type "response", "resource", "revisit", and "metadata".
2017-03-04 12:51:39 +01:00
Tim Rühsen
ac4fed3220 Fix 504 status handling
* src/http.c (gethttp): Move 504 handling to correct place.
  (http_loop): Fix memeory leak.
* testenv/server/http/http_server.py: Add Content-Length header on non-2xx
  status codes with a body

Reported-by: Adam Sampson
2017-02-16 15:53:56 +01:00
YX Hao
cf5df5593d * src/url.c (url_file_name): Do not charset convert local directory
In a non-ASCII environment, the local path may contain non-ASCII
characters. The server responded file name must be converted before
it is concatenated to the local path. Conversion after concatenation
may result in 'iconv' errors.
2017-02-16 12:52:16 +01:00
Tim Rühsen
ac9be9b756 * src/main.c: Remove double 'verbose' option
Fixes: #50290
2017-02-12 21:18:23 +01:00
Tim Rühsen
aebd49d9d4 isrc/http.c (check_retry_on_http_error): Fix gcc warning 2017-02-11 11:54:21 +01:00
Tom Szilagyi
d6d00006a0 Add support for --retry-on-http-error
* doc/wget.text: Add documentation
* src/http.c: Add function check_retry_on_http_error ()
* src/init.c: Add opt.retry_on_http_error
* src/main.c: Add struct for retry-on-http-error to option_data[]
* src/options.h: Add retry_on_http_error to struct options
2017-02-11 11:50:24 +01:00
Tim Rühsen
05acf5d3f6 Revert "Add support for --retry-on-http-error"
This reverts commit 977276374d.
2017-02-11 11:45:11 +01:00
Tim Rühsen
80c62c238e Change libtool library deps to non-libtool deps
Reported-by: Yousong Zhou
Fixes: #50260
2017-02-10 17:20:42 +01:00
Tom Szilagyi
977276374d Add support for --retry-on-http-error
* doc/wget.texi: Add description for --retry-on-http-error
* src/http.c (gethttp):
Consider given HTTP response codes as non-fatal, transient errors.
Supply a comma-separated list of 3-digit HTTP response codes as
argument. Useful to work around special circumstances where retries
are required, but the server responds with an error code normally not
retried by Wget. Such errors might be 503 (Service Unavailable) and
429 (Too Many Requests). Retries enabled by this option are performed
subject to the normal retry timing and retry count limitations of
Wget.

Using this option is intended to support special use cases only and is
generally not recommended, as it can force retries even in cases where
the server is actually trying to decrease its load. Please use it
wisely and only if you know what you are doing.

Example use and a starting point for manual testing:
  wget --retry-on-http-error=429,503 http://httpbin.org/status/503
2017-02-09 21:17:20 +01:00
Tim Rühsen
d061e553a1 * src/http.c (initialize_request): Fix regression in .netrc auth
Reported-by: Axel Reinhold
2017-02-06 21:44:18 +01:00
Tim Rühsen
2ddd2b69e4 * src/iri.c (idn_encode): Fix memory leak 2017-02-06 21:39:44 +01:00
Tim Rühsen
31ac36e170 Fix include/define clash with gnulib's unlink module
* src/options.h: Rename options.unlink to options.unlink_requested
* src/init.c: Replace options unlink member by unlink_requested
* src/http.c: Likewise
* src/ftp.c: Likewise
2017-02-04 18:02:54 +01:00
Tim Rühsen
f2c4289557 * src/xattr.h: Fix #define fsetxattr for MacOS and FreeBSD
Reported-by: Zhiming Wang
2017-02-04 15:29:44 +01:00
Tim Rühsen
366d82f349 * src/utils.c: Move macro FMT_MAX_LENGTH into scope 2017-02-03 12:35:49 +01:00
Tim Rühsen
f2574e90b7 * src/utils.c: Fix -Wformat= warnings 2017-02-03 12:33:38 +01:00
Tim Rühsen
81b3aaf75c * src/gnutls.c: Fix -Wformat= warnings 2017-02-03 12:31:51 +01:00
Tim Rühsen
17d2f42a3d * src/iri.c: Remove unused macro IDNA_FLAGS 2017-02-03 12:28:37 +01:00
Tim Rühsen
638df40476 * src/iri.c: Remove use of __func__ macros 2017-02-03 12:28:05 +01:00
Tim Rühsen
00bafe72f1 * src/http.c: Fix -Wformat= warnings 2017-02-03 12:24:41 +01:00
Tim Rühsen
e777c01f43 * src/progress.c: Remove unused macro move_to_end 2017-02-03 12:18:14 +01:00
Tim Rühsen
3ba112ea57 * src/html-parse.c: Remove unused macro SKIP_NON_WS 2017-02-03 12:15:24 +01:00
Tim Rühsen
485fcfcc20 * src/hsts.c: Remove unused macro CHECK_EXPLICIT_PORT 2017-02-03 12:09:18 +01:00
Tim Rühsen
a5094731cd * src/hsts.c: Fix -Wformat= warnings 2017-02-03 12:08:08 +01:00
Tim Rühsen
3186eb2976 * src/hash.c: Explicitly convert float to int 2017-02-03 12:03:50 +01:00
Tim Rühsen
9947663af8 * src/ftp-ls.c: Fix -Wformat= warnings 2017-02-03 11:59:33 +01:00
Tim Rühsen
cfae085665 * src/ftp.c (ftp_retrieve_list): Add default to switch 2017-02-03 11:57:02 +01:00
Tim Rühsen
e69808256b * src/css-url.h: Remove redundant declaration 2017-02-03 11:53:28 +01:00
Tim Rühsen
11989ef669 * src/ftp.c: Fix -Wformat= warning 2017-02-03 11:52:08 +01:00
Tim Rühsen
5fceab6cb9 * src/http.c (test_parse_range_header): Fix constants 2017-02-03 10:32:42 +01:00
Tim Rühsen
d0e02a54ae * src/url.c (mkalldirs): Add newline to log message 2017-02-02 11:11:50 +01:00
Tim Rühsen
2e70409844 * src/cookies.c (check_domain_match): Add newline to DEBUG lines 2017-02-02 11:11:07 +01:00
Tim Rühsen
6de24fe3c0 * src/iri.c: Use TR46 non-transitional for toASCII conversion 2017-01-13 15:53:03 +01:00
Tim Rühsen
0ab3d92c85 * src/main.c: Fix _Noreturn compiler warnings 2017-01-13 15:50:19 +01:00
Tim Rühsen
4cf8af84e0 * src/utils.c: Fix _Noreturn compiler warning 2017-01-13 15:49:05 +01:00
Tim Rühsen
42b8761cbc * src/init.c (setval_internal): Fix sign compare warning 2017-01-13 15:47:02 +01:00
Tim Rühsen
fd0f759597 Replace home-grown portability code by gnulib modules
* bootstrap.conf: Add intprops, inttypes, limits-h, signal-h,
  stat, sys_types
* src/ftp.c: Replace 'struct_stat' by 'struct stat'
* src/hsts.c: Likewise
* src/http.c: Likewise
* src/main.c: Likewise
* src/netrc.c: Likewise
* src/retr.c: Likewise
* src/url.c: Likewise
* src/utils.c: Likewise
* src/sysdep.h: Remove old portability code

Further portability issues should be addressed by gnulib.
2017-01-13 15:38:15 +01:00
Tim Rühsen
a384f5e2e9 Replace WGET_* m4 macros by gnulib modules
* bootstrap.conf: Add hostent, inet_ntop, nanosleep, utimens
* configure.ac: Remove WGET_STRUCT_UTIMBUF, WGET_FNMATCH,
  WGET_NANOSLEEP, WGET_POSIX_CLOCK, WGET_NSL_SOCKET
* m4/wget.m4: Likewise
* src/Makefile.am: Add $(LIB_NANOSLEEP) $(LIB_POSIX_SPAWN) to LDADD
* tests/Makefile.am: Likewise
* src/host.c (print_address): Use inet_ntop also for IPV4
2017-01-13 12:54:35 +01:00
Tim Rühsen
5ae1f37902 Remove libidn vulnerability work-around
* src/iri.c (_utf8_is_valid): Removed

Since we are using libidn2 for IDNs, we no longer need
this work-around.
2017-01-13 12:03:47 +01:00
Tim Rühsen
0eb4a21b6c * src/iri.c (idn_encode): Use TR46 transitional if available 2017-01-13 11:45:18 +01:00
Tim Rühsen
1a01a6b2d0 Fix previous commit 2427ca4ac0 2017-01-07 15:59:11 +01:00
vijeth-aradhya
2427ca4ac0 Fix http.c and ftp.c passwd logic error
* src/ftp.c (getftp): Fix password/user selection
* src/http.c (initialize_request): Likewise

Before, netrc password won over interactive
--ask-password but now --ask-password wins
after change of program logic

Fixes Issue #48811
2017-01-06 16:18:40 +01:00
Giuseppe Scrivano
42c2ce71bc * src/main.c (main): Add missing \n in error message 2016-12-31 17:11:01 +01:00
Giuseppe Scrivano
def133f26f Check that fd_set has not fds bigger than FD_SETSIZE
* src/connect.c: check that the fd is not bigger than FD_SETSIZE
  before using FD_SET.  An fd_set cannot hold fds bigger than
  FD_SETSIZE, causing out-of-bounds write to a buffer on the stack.

Reported by: Jann Horn <jannh@google.com>
2016-12-28 12:24:19 +01:00
Nikos Mavrogiannopoulos
b9ed06afd8 Avoid calling the gnutls priority functions multiple times
* src/gnutls.c (ssl_connect_wget): Call gnutls_set_default_priority()
  for --secure-protocol=auto (default).

The patch fixes a behavior that may have unintended side-effects in
certain gnutls versions. Instead use the default priorities when no
options are given.

Signed-off-by: Nikos Mavrogiannopoulos <nmav@gnutls.org>
2016-12-20 14:48:31 +01:00
Tim Rühsen
1bdc20d774 Print debug message when skipping certain recursive downloads
* src/recur.c (retrieve_tree): Print debug message instead silently
  skipping recursive downloads.
2016-12-19 12:19:52 +01:00
Rahul Bedarkar
e4e9d3c1c8 Rename base64_{encode,decode} (trivial patch)
* src/http-ntlm.c: Rename base64_{encode,decode}
* src/http.c: Likewise
* src/utils.c: Likewise
* src/utils.h: Likewise

When statically linking with gnutls, we get definition clash error for
base64_encode which is also defined by gnutls.

To prevent definition clash, rename base64_{encode,decode}

Signed-off-by: Rahul Bedarkar <rahul.bedarkar@imgtec.com>
2016-12-14 15:52:52 +01:00
Tim Rühsen
dcdd618b18 Add support for psl_latest()
* configure.ac: Add check for psl_latest(),
  remove --with-psl-file
* src/cookies.c (check_domain_match): Use psl_latest() if available
2016-12-11 21:04:40 +01:00
Piotr Wajda
3c796b9a85 Respect -o parameter again
* log.c: don't choose log output dynamically when opt.lfilename is set

 Regression introduced by dd5c549f6a
 Reported-by: Dale R. Worley
2016-11-09 13:32:14 +01:00
Tim Rühsen
00ae9b4ee2 Move Wget from IDN2003 (libidn) to IDN2008 (libidn2)
* .travis.yml: Install libidn2-dev instead libidn11-dev.
* bootstrap.conf: Add modules libunistring-optional, unistr/base,
  unicase/tolower.
* configure.ac: Check for libidn2.
* src/Makefile.am: Add $(LTLIBUNISTRING) to LDADD.
* tests/Makefile.am: Set LDADD similar to LDADD in src/Makefile.am
* src/connect.c: Use libidn2 code instead of libidn.
* src/host.c: Likewise.
* src/iri.c: Likewise.
* src/iri.h: Likewise.
* src/options.h: Likewise.
* src/url.c: Likewise.
* src/url.h: Likewise.
* src/log.c: Fix C99 comment.

IDN2003 should not be used any more due to security concerns.
We use libunistring (resp. the unicode code from gnulib) for
lowercasing UTF-8 before we give data to libidn2.
TR#46 is missing, no support in libidn2 nor in libunistring.
2016-11-07 11:03:42 +01:00
Tim Rühsen
be5517f98f * src/metalink.c: Fix typo 'suceeded' -> 'succeeded'
Reported-by: Göran Uddeborg <goeran@uddeborg.se>,
             Anders Jonsson <anders.jonsson@norsjovallen.se>
2016-10-22 22:21:39 +02:00
losgrandes
dd5c549f6a Fixes #45790: wget prints it's progress even when background
* src/log.c: Use tcgetpgrp(STDIN_FILENO) != getpgrp() to determine when to print to STD* or logfile.
  Deprecate log_request_redirect_output function.
  Use different file handles for STD* and logfile, to easily switch between them when changing fg/bg.
* src/log.h: Make redirect_output function externally linked.
* src/main.c: Don't use deprecated log_request_redirect_output function. Use redirect_output instead.
* src/mswindows.c: Don't use deprecated log_request_redirect_output function. Use redirect_output instead.
2016-10-21 19:33:29 +02:00
losgrandes
78e0ec5f03 Fixes #46584: wget --spider always returns zero exit status
* src/ftp.c: Return error as exit value if even one file doesn't exist
2016-10-21 10:24:28 +02:00
Tim Rühsen
807d1c7d94 * src/http.c (gethttp): Accept 206 for request w/o Range header
Fixes: #49319
2016-10-12 14:59:36 +02:00
Tim Rühsen
517d799b6f Properly include iconv.h
* src/iri.c: Check HAVE_ICONV to include iconv.h
* src/url.c: Same
2016-10-07 13:24:15 +02:00
Tim Rühsen
e5164a8260 Amend redirection behavior
* src/recur.c (descend_redirect): Ignore WG_RR_LIST and WG_RR_REGEX
  for redirections.
* testenv/Makefile.am: Add Test-recursive-redirect.py
* testenv/Test-recursive-redirect.py: New test

Test-recursive-redirect.py written by Dale R. Worley.

Reported-by: "Dale R. Worley" <worley@ariadne.com>
2016-10-07 11:49:07 +02:00
Matthew White
c403e67935 New: --metalink-over-http Content-Type/Disposition Metalink/XML processing
* src/http.c (metalink_from_http): Process the Content-Type header.
  Add an application/metalink4+xml URL as metalink metaurl.  If the
  option opt.content_disposition is true, the Content-Disposition's
  filename is the metaurl's name
* doc/wget.texi: Update --content-disposition and --metalink-over-http
* doc/metalink-standard.txt: Update doc. Content-Type/Disposition
  processing through --metalink-over-http. Update download naming
  system about --trust-server-names and --content-disposition
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml-type.py: New file. Metalink/HTTP
  Content-Type/Disposition header automated Metalink/XML tests
* testenv/Test-metalink-http-xml-type-trust.py: New file. Metalink/HTTP
  Content-Type/Disposition header with --trust-server-names automated
  Metalink/XML tests
* testenv/Test-metalink-http-xml-type-content.py: New file. Metalink/HTTP
  Content-Type/Disposition header with --content-disposition automated
  Metalink/XML tests
* testenv/Test-metalink-http-xml-type-trust-content.py: New file.
  Metalink/HTTP Content-Type/Disposition header with --trust-server-names
  and --content-disposition automated Metalink/XML tests

Process the Content-Type header, identify an application/metalink4+xml
file.  The Content-Disposition could provide an alternate name through
the "filename" field for the metalink xml file.  Respectively, the cli
options --metalink-over-http and --content-disposition are required.

When Metalink/XML auto-processing, to use the Content-Disposition's
filename, the cli option --trust-server-names is also required.
2016-09-30 19:44:06 +02:00
Matthew White
3021466817 Bugfix: Set NULL variable due to --content-disposition to Metalink origin
* src/http.c (http_loop): Prevent SIGSEGV when hstat.local_file is
  NULL, opt.content_disposition has a role in leaving the value unset
* src/http.c (gethttp): If hs->local_file is NULL (aka http_loop()'s
  hstat.local_file), set it to the value of hs->metalink->origin
2016-09-30 19:44:06 +02:00
Matthew White
c89767d8d1 New: --trust-server-names saves Metalink/HTTP xml files using the "name" field
* src/metalink.c (retrieve_from_metalink): If opt.trustservernames is
  true, use the basename of the metaurl's name to save the xml file
* doc/metalink-standard.txt: Update doc. With --trust-server-names any
  Metalink/HTTP Link application/metalink4+xml file is saved using the
  basename of the "name" field, if any. Update Metalink/HTTP examples
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-http-xml-trust-name.py: New file. Metalink/HTTP
  automated Metalink/XML, save xml files using the "name" field tests
2016-09-30 19:44:06 +02:00
Matthew White
f030cdf8e2 Bugfix: Detect when a metalink:file doesn't have any hash
* src/metalink.c (retrieve_from_metalink): Reject any metalink:file
  without hashes. Prompt the error and switch to the next file
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-nohash.py: New file. Metalink/XML with no
  hashes tests

Prevent SIGSEGV.
2016-09-30 19:44:06 +02:00
Matthew White
5dccb2a9ce Bugfix: Detect malformed base64 Metalink/HTTP Digest header
* src/http.c (metalink_from_http): Fix hash_bin_len type. Use ssize_t
  instead than size_t. Reject -1 as base64_decode() return value
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-http-baddigest.py: New file. Metalink/HTTP
  malformed base64 Digest header tests

On malformed base64 input, ssize_t base64_decode() returns -1. Such
value is too big for a size_t variable, and used as xmalloc() value
will exaust all the memory.
2016-09-30 19:44:06 +02:00
Matthew White
0538e791fb New option --metalink-index to process Metalink application/metalink4+xml
* NEWS: Mention the effect of --metalink-index over Metalink
* src/init.c: Add new option metalinkindex (opt.metalink_index),
  initialize to -1
* src/main.c: Add new option metalink-index (--metalink-index=NUMBER)
* src/options.h: Add new option metalink_index (int)
* src/metalink.h: Add declaration of functions fetch_metalink_file(),
  replace_metalink_basename()
* src/metalink.c: Add functions fetch_metalink_file() simple file
  fetch, replace_metalink_basename() replace file basename
* src/metalink.c (retrieve_from_metalink): New. Process Metalink
  application/metalink4+xml of opt.metalink_index ordinal number
* doc/wget.texi: Add new option metalink-index (--metalink-index)
  documentation
* doc/metalink-standard.txt: Updated doc. Add documentation about
  Metalink application/metalink4+xml metaurls download naming system
* doc/metalink-standard.txt: Update Metalink/XML and HTTP examples
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml.py: New file. Metalink/HTTP automated
  Metalink/XML "application/metalink4+xml" --metalink-index tests
* testenv/Test-metalink-http-xml-trust.py: New file. Metalink/HTTP
  automated Metalink/XML "application/metalink4+xml" --metalink-index
  retrieval with --trust-server-names tests

WARNING: Do not use lib/dirname.c (dir_name) to get the directory
name, it may append a dot '.' character to the directory name.
2016-09-30 19:44:06 +02:00
Matthew White
acb1d1a668 Bugfix: Prevent sorting when there are less than two elements
* src/utils.c (stable_sort): Add condition nmemb > 1, sort only when
  there is more than one element

Prevent SIGSEGV.
2016-09-30 19:44:06 +02:00
Matthew White
628fb565c7 New: Parse Metalink/HTTP header for application/metalink4+xml
* src/http.c (metalink_from_http): Parse Metalink/HTTP header for
  metaurls application/metalink4+xml media types
* src/metalink.h: Add function declaration metalink_meta_cmp()
* src/metalink.c: Add function metalink_meta_cmp() compare metalink
  metaurls priorities

Add Metalink/HTTP application/metalink4+xml media types as metaurls to
the metalink variable that will be used to download the files.
2016-09-30 19:44:05 +02:00
Matthew White
9532861aef Bugfix: Remove surrounding quotes from Metalink/HTTP key's value
* src/metalink.h: Add declaration of function dequote_metalink_string()
* src/metalink.c: Add function dequote_metalink_string() remove
  surrounding quotes from string, \' or \"
* src/metalink.c (find_key_value, find_key_values): Call dequote_metalink_string()
  to remove the surrounding quotes from the parsed value
* src/metalink.c (test_find_key_value, test_find_key_values): Add
  quoted key's values for unit-tests
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-http-quoted.py: New file. Metalink/HTTP quoted
  values tests

Some Metalink/HTTP keys, like "type" [2], may have a quoted value [1]:
Link: <http://example.com/example.ext.meta4>; rel=describedby;
type="application/metalink4+xml"

Wget was expecting a dequoted value from the Metalink module. This
patch addresses this problem.

References:
 [1] Metalink/HTTP: Mirrors and Hashes
     1.1. Example Metalink Server Response
     https://tools.ietf.org/html/rfc6249#section-1.1

 [2] Additional Link Relations
     6. "type"
     https://tools.ietf.org/html/rfc6903#section-6
2016-09-30 19:44:05 +02:00
Matthew White
8aca8fc80d Bugfix: Process Metalink/XML url strings containing white spaces and CRLF
* src/metalink.h: Add declaration of function clean_metalink_string()
* src/metalink.c: Add directive #include "xmemdup0.h"
* src/metalink.c: Add function clean_metalink_string() remove leading
  and trailing white spaces and CRLF from string
* src/metalink.c (retrieve_from_metalink): Remove leading and trailing
  white spaces and CRLF from url resource mres->url
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-urlbreak.py: New test. Metalink/XML white
  spaces and CRLF in url resources tests

White spaces and CRLF are not automatically removed by libmetalink
from url strings. The Wget's Metalink module was unable to process
such url strings. This patch implements the processing of such url
strings cleaning off leading and trailing white spaces and CRLF.

If a parsed Metalink/XML url string contains strings separated by
CRLF, only the first of the series is accepted.
2016-09-30 19:44:05 +02:00
Matthew White
70360b3eab New: Metalink file size mismatch returns error code METALINK_SIZE_ERROR
* src/wget.h (uerr_t): Add error code METALINK_SIZE_ERROR to enum
* src/metalink.c (retrieve_from_metalink): Use boolean variable
  size_ok, when false set retr_err to METALINK_SIZE_ERROR
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-size.py: New file. Metalink/XML file size
  tests (<size></size>)

Before this patch, no appropriate error code was returned to inform a
file size mismatch.

This patch introduces the error code METALINK_SIZE_ERROR to inform a
file size mismatch.
2016-09-30 19:44:05 +02:00
Matthew White
c29983a044 New: Metalink/XML and Metalink/HTTP file naming safety rules
* NEWS: Mention the effect of --trust-server-names over Metalink
* src/metalink.h: Add declaration of function append_suffix_number()
* src/metalink.c: Add function append_suffix_number() append number to
  string
* src/metalink.c (retrieve_from_metalink): Safer Metalink/XML and
  Metalink/HTTP download naming system, opt.trustservernames based
* doc/metalink-standard.txt: Update doc. Explain new Metalink/XML and
  Metalin/HTTP download naming system and --trust-server-names role
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-xml-continue.py: Update test. Metalink/XML
  continue/keep existing files (HTTP 416) with --continue tests
* testenv/Test-metalink-xml.py: Update test. Metalink/XML naming tests
* testenv/Test-metalink-xml-trust.py: New file. Metalink/XML naming
  tests with --trust-server-names
* testenv/Test-metalink-xml-abspath.py: Update test. Metalink/XML
  absolute path tests
* testenv/Test-metalink-xml-abspath-trust.py: New file. Metalink/XML
  absolute path tests with --trust-server-names
* testenv/Test-metalink-xml-relpath.py: Update test. Metalink/XML
  relative path tests
* testenv/Test-metalink-xml-relpath-trust.py: New file. Metalink/XML
  relative path tests with --trust-server-names
* testenv/Test-metalink-xml-homepath.py: Update test. Metalink/XML
  home path and ~ (tilde) tests
* testenv/Test-metalink-xml-homepath-trust.py: New file. Metalink/XML
  home path and ~ (tilde) tests with --trust-server-names
* testenv/Test-metalink-xml-prefix.py: New file. Metalink/XML naming
  tests with --directory-prefix
* testenv/Test-metalink-xml-prefix-trust.py: New file. Metalink/XML
  naming tests with --directory-prefix and --trust-server-names
* testenv/Test-metalink-xml-absprefix.py: New file. Metalink/XML
  absolute --directory-prefix tests
* testenv/Test-metalink-xml-absprefix-trust.py: New file. Metalink/XML
  absolute --directory-prefix tests with --trust-server-names
* testenv/Test-metalink-xml-relprefix.py: New file. Metalink/XML
  relative --directory-prefix tests
* testenv/Test-metalink-xml-relprefix-trust.py: New file. Metalink/XML
  relative --directory-prefix tests with --trust-server-names
* testenv/Test-metalink-xml-homeprefix.py: New file. Metalink/XML home
  --directory-prefix tests
* testenv/Test-metalink-xml-homeprefix-trust.py: New file. Metalink/XML
  home --directory-prefix tests with --trust-server-names

The option --trust-server-names allows to use the file names parsed
from a Metalink/XML file.  Without --trust-server-names, the safety
mechanism provides secure and predictable file names.
2016-09-30 19:44:05 +02:00
Matthew White
43ec7008f2 Enforce Metalink file name verification, strip directory if necessary
* NEWS: Mention the use of a safe Metalink destination path
* src/metalink.h: Add declaration of functions get_metalink_basename(),
  last_component(), metalink_check_safe_path()
* src/metalink.c: Add directive #include "dosname.h"
* src/metalink.c: Add function get_metalink_basename() to return the
  basename of a file name, strip w32's drive letter prefixes
* src/metalink.c (retrieve_from_metalink): Enforce Metalink file name
  verification, if the file name is unsafe try its basename
* doc/metalink.txt: Update document. Explain --directory-prefix

The function get_metalink_basename() uses FILE_SYSTEM_PREFIX_LEN to
catch any 'C:D:file' (w32 environment), then it removes each drive
letter prefix, i.e. 'C:' and 'D:'.

Unsafe file names contain an absolute, relative, or home path.  Safe
paths can be verified by libmetalink's metalink_check_safe_path().
2016-09-30 19:44:03 +02:00
Matthew White
7d4942864b Implement Metalink/XML --directory-prefix option in Metalink module
* NEWS: Mention the effect of --directory-prefix over Metalink
* src/metalink.c (retrieve_from_metalink): Add opt.dir_prefix as
  prefix to the metalink:file name mfile->name
* doc/metalink.txt: Update document. Explain --directory-prefix

When --directory-prefix=<prefix> is used, set the top of the retrieval
tree to prefix. The default is . (the current directory). Metalink/XML
and Metalink/HTTP files will be downloaded under prefix.
2016-09-27 20:29:03 +02:00
Matthew White
666b7862bf Change mfile->name to filename in Metalink module's messages
* src/metalink.c (retrieve_from_metalink): Change mfile->name to
  filename when referring to the downloaded file

The file name could have been changed by unique_create() (or by any
other mean) before downloading. Use the name of the downloaded file
(filename) when printing output which refer to it.
2016-09-27 20:29:03 +02:00
Matthew White
f3f349a0cf Add file size computation in Metalink module
* NEWS: Mention Metalink's file size verification
* src/metalink.c (retrieve_from_metalink): Add file size computation
* doc/metalink.txt: Update document. Remove resolved bugs

Reject downloaded files when they do not agree with their Metalink/XML
metalink:size: https://tools.ietf.org/html/rfc5854#section-4.2.14

At the moment of writing, Metalink/HTTP headers do not provide a file
size field. This information could be obtained from the Content-Length
header field: https://tools.ietf.org/html/rfc6249#section-7
2016-09-27 20:29:03 +02:00
Matthew White
ff444ebc2a Bugfix: Keep the download progress when alternating metalink:url
* NEWS: Mention the effects of --continue over Metalink
* src/metalink.c (retrieve_from_metalink): On download error, resume
  output_stream with the next mres->url. Keep fully downloaded files
  started with --continue, otherwise rename/remove the file
* testenv/Makefile.am: Add new file
* testenv/Test-metalink-xml-continue.py: New file. Metalink/XML
  continue/keep existing files (HTTP 416) with --continue tests

Before this patch, with --continue, existing and/or fully retrieved
files which fail the sanity tests were renamed (--keep-badhash), or
removed.

This patch ensures that --continue doesn't rename/remove existing
and/or fully retrieved files (HTTP 416) which fail the sanity tests.
2016-09-27 20:28:50 +02:00
Matthew White
96554861f9 Bugfix: Fix NULL filename and output_stream in Metalink module
* NEWS: Mention the Metalink "path/file" name format handling
* src/metalink.c (retrieve_from_metalink): Fix NULL filename, set
  filename to the right "path/file" value
* src/metalink.c (retrieve_from_metalink): Fix NULL output_stream, set
  output_stream to filename when it is created by retrieve_url()
* src/metalink.c (retrieve_from_metalink): Add RFC5854 comments about
  proper metalink:file "path/file" name format handling
* doc/metalink.txt: Update document. Remove resolved bugs

If unique_create() cannot create/open the destination file, filename
and output_stream remain NULL. If fopen() is used instead, filename
always remains NULL. Both functions cannot create "path/file" trees.

Setting filename to the right value is sufficient to prevent SIGSEGV
generating from testing a NULL value. This also allows retrieve_url()
to create a "path/file" tree through opt.output_document.

Reading NULL as output_stream, when it shall not be, leads to wrong
results. For instance, a non-NULL output_stream tells when a stream
was interrupted, reading NULL instead means to assume the contrary.

This patch conforms to the RFC5854 specification:
  The Metalink Download Description Format
  4.1.2.1.  The "name" Attribute
  https://tools.ietf.org/html/rfc5854#section-4.1.2.1
2016-09-27 20:17:08 +02:00
Tim Rühsen
a2c4849900 Fix crash on 'srcset' inline URIs
* src/html-url.c (tag_handle_img): Check append_url() for NULL
  return value before dereference.

Crashed reproducable with parsing srcset="data:..." inline data.
Reported-by: Coverity
2016-09-09 11:44:02 +02:00
Tim Rühsen
40870e1271 * src/hsts.c (hsts_store_open): NULL check param for fclose().
Reported-by: Coverity
2016-09-09 10:22:58 +02:00
Tim Rühsen
15c1e0eb7b * src/ftp-ls.c (ftp_parse_winnt_ls): Fix memset params 2016-09-09 10:22:58 +02:00
Tim Rühsen
eba724a128 * src/utils.c (stable_sort): Use xmalloc instead of malloc 2016-09-09 10:22:58 +02:00
Tim Rühsen
66a9883c8f * src/ftp-ls.c (ftp_parse_winnt_ls): Initialize struct fileinfo cur
Reported-by: Coverity
2016-09-08 16:51:15 +02:00
Tim Rühsen
4febe72bd2 Add const to url param of some functions
* src/http.c: Add const to first param of initialize_request(),
  initialize_proxy_configuration(), establish_connection(),
  check_file_output(), check_auth(), gethttp(), http_loop().
* src/http.h: Add const to first param of http_loop().
2016-09-08 16:13:54 +02:00
Tim Rühsen
03da900c5b * src/recur.c (retrieve_tree): Fix possible NULL dereference
Reported-by: Coverity
2016-09-08 13:04:37 +02:00
Tim Rühsen
b7b67e23cd * src/http.c (initialize_request): Fix check for user
Reported-by: Coverity
2016-09-08 12:48:32 +02:00
Tim Rühsen
22aed3ed4b * src/retr.c (retrieve_url): NULL check mynewloc
Reported-by: Coverity
2016-09-08 12:46:25 +02:00
Tim Rühsen
b4465afa8a * src/utils.c (stable_sort): Reduce tmp allocation size
Reported-by: Coverity
2016-09-08 12:44:17 +02:00
Tim Rühsen
a78b83b1e9 Fix some issues detected by Coverity
* src/connect.c (connect_to_ip): Check return value of setsockopt.
* src/ftp.c (ftp_retrieve_list): Check return value of chmod.
* src/http.c (digest_authentication_encode): Cleanup code.
* src/init.c (setval_internal): Explicitely check comind range.
* src/main.c (main): Explicitely check optarg.
* src/retr.c (retr_rate): Use snprintf instead sprintf,
  (retrieve_from_file): More verbose error message,
  (rotate_backups): Use snprintf instead sprintf, check return
  value of rename().
* src/url.c (mkalldirs): Check return value of unlink().
* src/utils.c (strdupdelim): Explicitely check beg and end for NULL,
  (merge_vecs): Fix sizeof argument to char *,
  (stable_sort): Use malloc instead of alloca.
2016-09-08 10:12:02 +02:00
Tim Rühsen
37a5257c66 Code cleanup for --use-askpass
* bootstrap.conf: Add xmemdup0 and strpbrk.
* src/init.c (cmd_use_askpass): Add 'const' to char *,
  remove check for file existence.
* src/main.c (run_use_askpass): C89 compat init of argv,
  added \n to error messages,
  fixed stripping of \n and \r from input,
  make run_use_askpass and use_askpass static.
2016-09-08 09:07:32 +02:00
Tim Rühsen
49af22ca94 * src/http.c (check_file_output): Replace asprintf by aprint 2016-09-07 09:31:43 +02:00
Liam R. Howlett
21e1725e12 Add --use-askpass=COMMAND support
* doc/wget.texi: Add --use-askpass to documentation.
* src/init.c: Add cmd_use_askpasss to set opt.use_askpass based on
argument, WGET_ASKPASS, and SSH_ASKPASS environment variables.
opt.wget-askpass is freed in cleanup ()
* src/main.c: Update options & add spawn process of opt.use_askpass
command.
* src/options.h: Addition of string use_askpass.
* src/url.c: Function scheme_leading_string to access the leading
string of a parsed url.
* src/url.h: Prototype for scheme_leading_string for returning the
leading string.
* bootstrap.conf: Add posix_spawn to gnulib_modules

This adds the --use-askpass option which is disabled by default.

--use-askpass=COMMAND will request the username and password for a given
URL by executing the external program COMMAND.  If COMMAND is left
blank, then the external program in the environment variable
WGET_ASKPASS will be used.  If WGET_ASKPASS is not set then the
environment variable SSH_ASKPASS is used.  If there is no value set, an
error is returned.  If an error occurs requesting the username or
password, wget will exit.

Signed-off-by: Liam R. Howlett <Liam.Howlett@WindRiver.com>
2016-09-03 21:01:24 +02:00
Giuseppe Scrivano
690c47e3b1 Append .tmp to temporary files
* src/http.c (struct http_stat): Add `temporary` flag.
(check_file_output): Append .tmp to temporary files.
(open_output_stream): Refactor condition to use hs->temporary instead.

Reported-by: "Misra, Deapesh" <dmisra@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
2016-08-24 12:29:01 +02:00
Tim Rühsen
9ffb64ba6a Limit file mode to u=rw on temp. downloaded files
* bootstrap.conf: Add gnulib modules fopen, open.
* src/http.c (open_output_stream): Limit file mode to u=rw
on temporary downloaded files.

Reported-by: "Misra, Deapesh" <dmisra@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
2016-08-24 12:28:55 +02:00
Tim Rühsen
0787d7253e * src/css-url.c (get_urls_css): Fix memory leak 2016-08-17 23:13:27 +02:00
Tim Rühsen
964f4646da * src/html-url.c (get_urls_html): Fix memory leak 2016-08-17 23:12:25 +02:00
Tim Rühsen
262baeb113 Improve PSL cookie checking
* configure.ac: Add --with-psl-file to set a PSL file
* src/cookies.c (check_domain_match): Load PSL_FILE with
  fallback to built-in data.

This change allows package maintainers to make Wget use the latest
PSL (DAFSA or plain text), without updating libpsl itself.

E.g. Debian now comes with a DAFSA binary within the 'publicsuffix'
package which allows very fast loading (no parsing or processing needed).
2016-08-17 16:32:26 +02:00
Tobias Stoeckmann
f4aeb41899 Fix stack overflow with way too many cookies
* src/cookies.c (cookie_header): Use heap instead of stack.
* src/http.c (request_send): Likewise.

If wget has to handle an insanely large amount of cookies (~700,000 on
32 bit systems or ~530,000 on 64 bit systems), the stack is not large
enough to hold these pointers, leading to undefined behaviour according
to POSIX; expect a segmentation fault in real life. ;)

Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
2016-08-10 19:59:25 +02:00
Tobias Stoeckmann
a9d49e5b15 Fix signal race condition
The signal handler for SIGALRM calls longjmp, but the handler is
installed before the jump target has been initialized. If another
process sends SIGALRM right between handler installation and target
initialization, the jump leads to undefined behavior.

This can easily be fixed by moving the signal handler installation
into the "SETJMP == 0" conditional block, which means that the target
has just been initialized.

* src/utils.c: call signal after SETJMP.

Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
2016-08-09 17:38:29 +02:00
Jeffery To
0fe79eeacb Remove hyphens from command names
* src/init.c: Remove hyphens from command names
* src/main.c: Likewise

Options with hyphens (or underscores) in their command name cannot be
set in a wgetrc file.

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
2016-08-05 09:45:09 +02:00
Tim Rühsen
e3fb4c3859 * src/metalink.c (badhash_suffix): Fix quoting 2016-08-04 13:09:28 +02:00
Matthew White
943a6d585f Add new option --keep-badhash to keep Metalink's files with a bad hash
* src/init.c: Add keepbadhash
* src/main.c: Add keep-badhash
* src/options.h: Add keep_badhash
* doc/wget.texi: Add docs for --keep-badhash
* src/metalink.h: Add prototypes badhash_suffix(), badhash_or_remove()
* src/metalink.c: New functions badhash_suffix(), badhash_or_remove().
  (retrieve_from_metalink): Call badhash_or_remove() on download error

With --keep-badhash, append .badhash to Metalink's files with checksum
mismatch. (retrieve_from_metalink): unique_create() may append another
suffix to avoid overwriting existing files.

Without --keep-badhash, remove downloaded files with checksum mismatch
(this conforms to the old behaviour).
2016-08-04 12:03:49 +02:00
Tim Rühsen
7fad76db4c * src/metalink.c: Remove C++ style comments 2016-08-03 13:48:07 +02:00