Wget Git mirror
Go to file
Jookia e4db00d74d Add option to write URL rejections to a tab-delimited CSV log.
* main.c: Add "--rejected-log" option.
 * init.c: Add "rejectedlog" command.
 * options.h: Add "rejected_log" parameter string.
 * wget.texi: Add brief documentation on new --rejected-log option.
 * recur.c: Optionally log details of URLs not traversed.
   Add reject_reason enum.
   (download_child_p -> download_child): Return a reject_reason.
   (descend_redirect_p -> descend_redirect): Return a reject_reason.
   (retrieve_tree): Support logging reasons for rejection.
   Add write_reject_log_header that writes a CSV format header to a file.
   Add write_reject_log_url that writes a url struct to a file in CSV format.
   Add write_reject_log_reason that writes the URL and parent URL as well as the
   rejection reason to a CSV file.
 * Test--rejected-log.px: Add a basic test for the --rejected-log command.
 * tests/Makefile.am: Run Test--rejected-log.px.

This allows you to figure out why URLs are being rejected and some context
around it. CSV is used as the output format since it can be used easily parsed,
it's delimited by tabs instead of commas to allow using all (quoted) URL
characters and includes column names which may be used for compatibility.
2015-08-06 08:10:55 +02:00
build-aux Fix missing extern declaration error for build_info.pl 2014-11-22 17:26:06 +05:30
contrib contrib/check-hard: Indentation and spacing cleanup 2015-06-15 01:35:53 +05:30
doc Add option to write URL rejections to a tab-delimited CSV log. 2015-08-06 08:10:55 +02:00
gnulib@875ec93e15 gnulib: update gnulib 2015-05-23 14:51:52 +02:00
m4 Fix autoconf warning 2012-09-30 17:48:13 +02:00
msdos maint: update copyright year ranges to include 2015 2015-03-09 16:32:01 +01:00
po Metalink support. 2015-07-20 15:30:39 +02:00
src Add option to write URL rejections to a tab-delimited CSV log. 2015-08-06 08:10:55 +02:00
testenv Fix metalink tests 2015-07-20 16:29:05 +02:00
tests Add option to write URL rejections to a tab-delimited CSV log. 2015-08-06 08:10:55 +02:00
util paramcheck: use explicit quoting for here-docs 2015-05-04 10:13:24 +05:30
vms Do not depend on always defined macros 2014-06-12 18:49:15 +02:00
.gitignore Add po/stamp-po to gitignore 2014-11-20 18:57:37 +05:30
.gitmodules gnulib: add as a git submodule 2013-12-22 14:12:05 +01:00
.x-sc_po_check [mq]: cfg-mk 2009-09-21 20:39:44 -07:00
.x-sc_trailing_blank [mq]: cfg-mk 2009-09-21 20:39:44 -07:00
ABOUT-NLS Prepare new release 1.16 2014-10-27 09:56:47 +01:00
AUTHORS Fix typo. 2010-09-07 18:05:19 +02:00
bootstrap Update bootstrap script 2014-11-16 15:00:24 +05:30
bootstrap.conf Metalink support. 2015-07-20 15:30:39 +02:00
cfg.mk * cfg.mk (VC_LIST_ALWAYS_EXCLUDE_REGEX): Add ChangeLog-2014-12-10. 2015-01-31 00:30:26 +01:00
ChangeLog merge ChangeLog files in ChangeLog-2014-12-10. 2014-12-24 11:04:30 +01:00
ChangeLog-2014-12-10 Generate distributed ChangeLog from git log 2014-12-24 11:04:30 +01:00
configure.ac Fix configure options for metalink 2015-07-24 23:42:20 +05:30
COPYING Fix copyright year. 2011-05-19 10:34:26 +02:00
MAILING-LIST Remove trailing empty lines 2014-06-12 18:49:15 +02:00
Makefile.am Generate distributed ChangeLog from git log 2014-12-24 11:04:30 +01:00
NEWS NEWS: cite HSTS 2015-07-20 16:31:17 +02:00
README Remove MAINTAINER from the README file. 2010-07-12 13:56:58 +02:00
README.checkout Improved test suite documentation 2015-04-13 19:36:39 +02:00

                                                          -*- text -*-
GNU Wget
========
                  Current Web home: http://www.gnu.org/software/wget/

GNU Wget is a free utility for non-interactive download of files from
the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as
retrieval through HTTP proxies.

It can follow links in HTML pages and create local versions of remote
web sites, fully recreating the directory structure of the original
site.  This is sometimes referred to as "recursive downloading."
While doing that, Wget respects the Robot Exclusion Standard
(/robots.txt).  Wget can be instructed to convert the links in
downloaded HTML files to the local files for offline viewing.

Recursive downloading also works with FTP, where Wget can retrieves a
hierarchy of directories and files.

With both HTTP and FTP, Wget can check whether a remote file has
changed on the server since the previous run, and only download the
newer files.

Wget has been designed for robustness over slow or unstable network
connections; if a download fails due to a network problem, it will
keep retrying until the whole file has been retrieved.  If the server
supports regetting, it will instruct the server to continue the
download from where it left off.

If you are behind a firewall that requires the use of a socks style
gateway, you can get the socks library and compile wget with support
for socks.

Most of the features are configurable, either through command-line
options, or via initialization file .wgetrc.  Wget allows you to
install a global startup file (/usr/local/etc/wgetrc by default) for
site settings.

Wget works under almost all Unix variants in use today and, unlike
many of its historical predecessors, is written entirely in C, thus
requiring no additional software, such as Perl.  The external software
it does work with, such as OpenSSL, is optional.  As Wget uses the GNU
Autoconf, it is easily built on and ported to new Unix-like systems.
The installation procedure is described in the INSTALL file.

As with other GNU software, the latest version of Wget can be found at
the master GNU archive site ftp.gnu.org, and its mirrors.  Wget
resides at <ftp://ftp.gnu.org/pub/gnu/wget/>.

Please report bugs in Wget to <bug-wget@gnu.org>.

See the file `MAILING-LIST' for information about Wget mailing lists.
Wget's home page is at <http://www.gnu.org/software/wget/>.

If you would like to contribute code for Wget, please read
http://wget.addictivecode.org/PatchGuidelines.

Wget was originally written and mainained by Hrvoje Niksic.  Please see
the file AUTHORS for a list of major contributors, and the ChangeLogs
for a detailed listing of all contributions.


Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
USA.

Additional permission under GNU GPL version 3 section 7

If you modify this program, or any covered work, by linking or
combining it with the OpenSSL project's OpenSSL library (or a
modified version of that library), containing parts covered by the
terms of the OpenSSL or SSLeay licenses, the Free Software Foundation
grants you additional permission to convey the resulting work.
Corresponding Source for a non-source form of such a combination
shall include the source code for the parts of OpenSSL used as well
as that of the covered work.