[svn] When downloading recursively, don't ignore rejection of HTML

documents that are themselves leaves of recursion.
This commit is contained in:
hniksic 2002-04-15 14:57:10 -07:00
parent b9210ecbc8
commit f8b4b8bd12
2 changed files with 12 additions and 16 deletions

View File

@ -1,3 +1,8 @@
2002-04-15 Hrvoje Niksic <hniksic@arsdigita.com>
* recur.c (download_child_p): Don't ignore rejection of HTML
documents that are themselves leaves of recursion.
2002-04-15 Ian Abbott <abbotti@mev.co.uk>
Makefile.in: Updated several dependencies for object files to take
@ -16,6 +21,7 @@
* utils.c: Don't define `SETJMP()', `run_with_timeout_env' or
`abort_run_with_timeout()' when `USE_SIGNAL_TIMEOUT' is undefined.
>>>>>>> 1.395
2002-04-15 Hrvoje Niksic <hniksic@arsdigita.com>
* host.c (getaddrinfo_with_timeout): New function.

View File

@ -511,23 +511,13 @@ download_child_p (const struct urlpos *upos, struct url *parent, int depth,
/* 6. */
{
/* Check for acceptance/rejection rules. We ignore these rules
for HTML documents because they might lead to other files which
need to be downloaded. Of course, we don't know which
documents are HTML before downloading them, so we guess.
A file is subject to acceptance/rejection rules if:
* u->file is not "" (i.e. it is not a directory)
and either:
+ there is no file suffix,
+ or there is a suffix, but is not "html" or "htm" or similar,
+ both:
- recursion is not infinite,
- and we are at its very end. */
for directories (no file name to match) and for HTML documents,
which might lead to other files that do need to be downloaded.
That is, unless we've exhausted the recursion depth anyway. */
if (u->file[0] != '\0'
&& (!has_html_suffix_p (url)
|| (opt.reclevel != INFINITE_RECURSION && depth >= opt.reclevel)))
&& !(has_html_suffix_p (u->file)
&& depth < opt.reclevel - 1
&& depth != INFINITE_RECURSION))
{
if (!acceptable (u->file))
{