comparison toolboxes/wget/man/cat1/wget.1.txt @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e9a9cd732c1e
1 WGET(1) GNU Wget WGET(1)
2
3
4
5 NAME
6 Wget - The non-interactive network downloader.
7
8 SYNOPSIS
9 wget [option]... [URL]...
10
11 DESCRIPTION
12 GNU Wget is a free utility for non-interactive download
13 of files from the Web. It supports HTTP, HTTPS, and FTP
14 protocols, as well as retrieval through HTTP proxies.
15
16 Wget is non-interactive, meaning that it can work in the
17 background, while the user is not logged on. This
18 allows you to start a retrieval and disconnect from the
19 system, letting Wget finish the work. By contrast, most
20 of the Web browsers require constant user's presence,
21 which can be a great hindrance when transferring a lot
22 of data.
23
24 Wget can follow links in HTML and XHTML pages and create
25 local versions of remote web sites, fully recreating the
26 directory structure of the original site. This is some-
27 times referred to as "recursive downloading." While
28 doing that, Wget respects the Robot Exclusion Standard
29 (/robots.txt). Wget can be instructed to convert the
30 links in downloaded HTML files to the local files for
31 offline viewing.
32
33 Wget has been designed for robustness over slow or
34 unstable network connections; if a download fails due to
35 a network problem, it will keep retrying until the whole
36 file has been retrieved. If the server supports reget-
37 ting, it will instruct the server to continue the down-
38 load from where it left off.
39
40 OPTIONS
41 Option Syntax
42
43 Since Wget uses GNU getopt to process command-line argu-
44 ments, every option has a long form along with the short
45 one. Long options are more convenient to remember, but
46 take time to type. You may freely mix different option
47 styles, or specify options after the command-line argu-
48 ments. Thus you may write:
49
50 wget -r --tries=10 http://fly.srk.fer.hr/ -o log
51
52 The space between the option accepting an argument and
53 the argument may be omitted. Instead of -o log you can
54 write -olog.
55
56 You may put several options that do not require argu-
57 ments together, like:
58
59 wget -drc <URL>
60
61 This is a complete equivalent of:
62
63 wget -d -r -c <URL>
64
65 Since the options can be specified after the arguments,
66 you may terminate them with --. So the following will
67 try to download URL -x, reporting failure to log:
68
69 wget -o log -- -x
70
71 The options that accept comma-separated lists all
72 respect the convention that specifying an empty list
73 clears its value. This can be useful to clear the
74 .wgetrc settings. For instance, if your .wgetrc sets
75 "exclude_directories" to /cgi-bin, the following example
76 will first reset it, and then set it to exclude /~nobody
77 and /~somebody. You can also clear the lists in
78 .wgetrc.
79
80 wget -X " -X /~nobody,/~somebody
81
82 Most options that do not accept arguments are boolean
83 options, so named because their state can be captured
84 with a yes-or-no ("boolean") variable. For example,
85 --follow-ftp tells Wget to follow FTP links from HTML
86 files and, on the other hand, --no-glob tells it not to
87 perform file globbing on FTP URLs. A boolean option is
88 either affirmative or negative (beginning with --no).
89 All such options share several properties.
90
91 Unless stated otherwise, it is assumed that the default
92 behavior is the opposite of what the option accom-
93 plishes. For example, the documented existence of
94 --follow-ftp assumes that the default is to not follow
95 FTP links from HTML pages.
96
97 Affirmative options can be negated by prepending the
98 --no- to the option name; negative options can be
99 negated by omitting the --no- prefix. This might seem
100 superfluous---if the default for an affirmative option
101 is to not do something, then why provide a way to
102 explicitly turn it off? But the startup file may in
103 fact change the default. For instance, using "fol-
104 low_ftp = off" in .wgetrc makes Wget not follow FTP
105 links by default, and using --no-follow-ftp is the only
106 way to restore the factory default from the command
107 line.
108
109 Basic Startup Options
110
111
112 -V
113 --version
114 Display the version of Wget.
115
116 -h
117 --help
118 Print a help message describing all of Wget's com-
119 mand-line options.
120
121 -b
122 --background
123 Go to background immediately after startup. If no
124 output file is specified via the -o, output is redi-
125 rected to wget-log.
126
127 -e command
128 --execute command
129 Execute command as if it were a part of .wgetrc. A
130 command thus invoked will be executed after the com-
131 mands in .wgetrc, thus taking precedence over them.
132 If you need to specify more than one wgetrc command,
133 use multiple instances of -e.
134
135 Logging and Input File Options
136
137
138 -o logfile
139 --output-file=logfile
140 Log all messages to logfile. The messages are nor-
141 mally reported to standard error.
142
143 -a logfile
144 --append-output=logfile
145 Append to logfile. This is the same as -o, only it
146 appends to logfile instead of overwriting the old
147 log file. If logfile does not exist, a new file is
148 created.
149
150 -d
151 --debug
152 Turn on debug output, meaning various information
153 important to the developers of Wget if it does not
154 work properly. Your system administrator may have
155 chosen to compile Wget without debug support, in
156 which case -d will not work. Please note that com-
157 piling with debug support is always safe---Wget com-
158 piled with the debug support will not print any
159 debug info unless requested with -d.
160
161 -q
162 --quiet
163 Turn off Wget's output.
164
165 -v
166 --verbose
167 Turn on verbose output, with all the available data.
168 The default output is verbose.
169
170 -nv
171 --no-verbose
172 Turn off verbose without being completely quiet (use
173 -q for that), which means that error messages and
174 basic information still get printed.
175
176 -i file
177 --input-file=file
178 Read URLs from file. If - is specified as file,
179 URLs are read from the standard input. (Use ./- to
180 read from a file literally named -.)
181
182 If this function is used, no URLs need be present on
183 the command line. If there are URLs both on the
184 command line and in an input file, those on the com-
185 mand lines will be the first ones to be retrieved.
186 The file need not be an HTML document (but no harm
187 if it is)---it is enough if the URLs are just listed
188 sequentially.
189
190 However, if you specify --force-html, the document
191 will be regarded as html. In that case you may have
192 problems with relative links, which you can solve
193 either by adding "<base href="url">" to the docu-
194 ments or by specifying --base=url on the command
195 line.
196
197 -F
198 --force-html
199 When input is read from a file, force it to be
200 treated as an HTML file. This enables you to
201 retrieve relative links from existing HTML files on
202 your local disk, by adding "<base href="url">" to
203 HTML, or using the --base command-line option.
204
205 -B URL
206 --base=URL
207 Prepends URL to relative links read from the file
208 specified with the -i option.
209
210 Download Options
211
212
213 --bind-address=ADDRESS
214 When making client TCP/IP connections, bind to
215 ADDRESS on the local machine. ADDRESS may be speci-
216 fied as a hostname or IP address. This option can
217 be useful if your machine is bound to multiple IPs.
218
219 -t number
220 --tries=number
221 Set number of retries to number. Specify 0 or inf
222 for infinite retrying. The default is to retry 20
223 times, with the exception of fatal errors like "con-
224 nection refused" or "not found" (404), which are not
225 retried.
226
227 -O file
228 --output-document=file
229 The documents will not be written to the appropriate
230 files, but all will be concatenated together and
231 written to file. If - is used as file, documents
232 will be printed to standard output, disabling link
233 conversion. (Use ./- to print to a file literally
234 named -.)
235
236 Use of -O is not intended to mean simply "use the
237 name file instead of the one in the URL;" rather, it
238 is analogous to shell redirection: wget -O file
239 http://foo is intended to work like wget -O -
240 http://foo > file; file will be truncated immedi-
241 ately, and all downloaded content will be written
242 there.
243
244 For this reason, -N (for timestamp-checking) is not
245 supported in combination with -O: since file is
246 always newly created, it will always have a very new
247 timestamp. A warning will be issued if this combina-
248 tion is used.
249
250 Similarly, using -r or -p with -O may not work as
251 you expect: Wget won't just download the first file
252 to file and then download the rest to their normal
253 names: all downloaded content will be placed in
254 file. This was disabled in version 1.11, but has
255 been reinstated (with a warning) in 1.11.2, as there
256 are some cases where this behavior can actually have
257 some use.
258
259 Note that a combination with -k is only permitted
260 when downloading a single document, as in that case
261 it will just convert all relative URIs to external
262 ones; -k makes no sense for multiple URIs when
263 they're all being downloaded to a single file.
264
265 -nc
266 --no-clobber
267 If a file is downloaded more than once in the same
268 directory, Wget's behavior depends on a few options,
269 including -nc. In certain cases, the local file
270 will be clobbered, or overwritten, upon repeated
271 download. In other cases it will be preserved.
272
273 When running Wget without -N, -nc, -r, or p, down-
274 loading the same file in the same directory will
275 result in the original copy of file being preserved
276 and the second copy being named file.1. If that
277 file is downloaded yet again, the third copy will be
278 named file.2, and so on. When -nc is specified,
279 this behavior is suppressed, and Wget will refuse to
280 download newer copies of file. Therefore,
281 ""no-clobber"" is actually a misnomer in this
282 mode---it's not clobbering that's prevented (as the
283 numeric suffixes were already preventing clobber-
284 ing), but rather the multiple version saving that's
285 prevented.
286
287 When running Wget with -r or -p, but without -N or
288 -nc, re-downloading a file will result in the new
289 copy simply overwriting the old. Adding -nc will
290 prevent this behavior, instead causing the original
291 version to be preserved and any newer copies on the
292 server to be ignored.
293
294 When running Wget with -N, with or without -r or -p,
295 the decision as to whether or not to download a
296 newer copy of a file depends on the local and remote
297 timestamp and size of the file. -nc may not be
298 specified at the same time as -N.
299
300 Note that when -nc is specified, files with the suf-
301 fixes .html or .htm will be loaded from the local
302 disk and parsed as if they had been retrieved from
303 the Web.
304
305 -c
306 --continue
307 Continue getting a partially-downloaded file. This
308 is useful when you want to finish up a download
309 started by a previous instance of Wget, or by
310 another program. For instance:
311
312 wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
313
314 If there is a file named ls-lR.Z in the current
315 directory, Wget will assume that it is the first
316 portion of the remote file, and will ask the server
317 to continue the retrieval from an offset equal to
318 the length of the local file.
319
320 Note that you don't need to specify this option if
321 you just want the current invocation of Wget to
322 retry downloading a file should the connection be
323 lost midway through. This is the default behavior.
324 -c only affects resumption of downloads started
325 prior to this invocation of Wget, and whose local
326 files are still sitting around.
327
328 Without -c, the previous example would just download
329 the remote file to ls-lR.Z.1, leaving the truncated
330 ls-lR.Z file alone.
331
332 Beginning with Wget 1.7, if you use -c on a non-
333 empty file, and it turns out that the server does
334 not support continued downloading, Wget will refuse
335 to start the download from scratch, which would
336 effectively ruin existing contents. If you really
337 want the download to start from scratch, remove the
338 file.
339
340 Also beginning with Wget 1.7, if you use -c on a
341 file which is of equal size as the one on the
342 server, Wget will refuse to download the file and
343 print an explanatory message. The same happens when
344 the file is smaller on the server than locally (pre-
345 sumably because it was changed on the server since
346 your last download attempt)---because "continuing"
347 is not meaningful, no download occurs.
348
349 On the other side of the coin, while using -c, any
350 file that's bigger on the server than locally will
351 be considered an incomplete download and only
352 "(length(remote) - length(local))" bytes will be
353 downloaded and tacked onto the end of the local
354 file. This behavior can be desirable in certain
355 cases---for instance, you can use wget -c to down-
356 load just the new portion that's been appended to a
357 data collection or log file.
358
359 However, if the file is bigger on the server because
360 it's been changed, as opposed to just appended to,
361 you'll end up with a garbled file. Wget has no way
362 of verifying that the local file is really a valid
363 prefix of the remote file. You need to be espe-
364 cially careful of this when using -c in conjunction
365 with -r, since every file will be considered as an
366 "incomplete download" candidate.
367
368 Another instance where you'll get a garbled file if
369 you try to use -c is if you have a lame HTTP proxy
370 that inserts a "transfer interrupted" string into
371 the local file. In the future a "rollback" option
372 may be added to deal with this case.
373
374 Note that -c only works with FTP servers and with
375 HTTP servers that support the "Range" header.
376
377 --progress=type
378 Select the type of the progress indicator you wish
379 to use. Legal indicators are "dot" and "bar".
380
381 The "bar" indicator is used by default. It draws an
382 ASCII progress bar graphics (a.k.a "thermometer"
383 display) indicating the status of retrieval. If the
384 output is not a TTY, the "dot" bar will be used by
385 default.
386
387 Use --progress=dot to switch to the "dot" display.
388 It traces the retrieval by printing dots on the
389 screen, each dot representing a fixed amount of
390 downloaded data.
391
392 When using the dotted retrieval, you may also set
393 the style by specifying the type as dot:style. Dif-
394 ferent styles assign different meaning to one dot.
395 With the "default" style each dot represents 1K,
396 there are ten dots in a cluster and 50 dots in a
397 line. The "binary" style has a more "computer"-like
398 orientation---8K dots, 16-dots clusters and 48 dots
399 per line (which makes for 384K lines). The "mega"
400 style is suitable for downloading very large
401 files---each dot represents 64K retrieved, there are
402 eight dots in a cluster, and 48 dots on each line
403 (so each line contains 3M).
404
405 Note that you can set the default style using the
406 "progress" command in .wgetrc. That setting may be
407 overridden from the command line. The exception is
408 that, when the output is not a TTY, the "dot"
409 progress will be favored over "bar". To force the
410 bar output, use --progress=bar:force.
411
412 -N
413 --timestamping
414 Turn on time-stamping.
415
416 -S
417 --server-response
418 Print the headers sent by HTTP servers and responses
419 sent by FTP servers.
420
421 --spider
422 When invoked with this option, Wget will behave as a
423 Web spider, which means that it will not download
424 the pages, just check that they are there. For
425 example, you can use Wget to check your bookmarks:
426
427 wget --spider --force-html -i bookmarks.html
428
429 This feature needs much more work for Wget to get
430 close to the functionality of real web spiders.
431
432 -T seconds
433 --timeout=seconds
434 Set the network timeout to seconds seconds. This is
435 equivalent to specifying --dns-timeout, --con-
436 nect-timeout, and --read-timeout, all at the same
437 time.
438
439 When interacting with the network, Wget can check
440 for timeout and abort the operation if it takes too
441 long. This prevents anomalies like hanging reads
442 and infinite connects. The only timeout enabled by
443 default is a 900-second read timeout. Setting a
444 timeout to 0 disables it altogether. Unless you
445 know what you are doing, it is best not to change
446 the default timeout settings.
447
448 All timeout-related options accept decimal values,
449 as well as subsecond values. For example, 0.1 sec-
450 onds is a legal (though unwise) choice of timeout.
451 Subsecond timeouts are useful for checking server
452 response times or for testing network latency.
453
454 --dns-timeout=seconds
455 Set the DNS lookup timeout to seconds seconds. DNS
456 lookups that don't complete within the specified
457 time will fail. By default, there is no timeout on
458 DNS lookups, other than that implemented by system
459 libraries.
460
461 --connect-timeout=seconds
462 Set the connect timeout to seconds seconds. TCP
463 connections that take longer to establish will be
464 aborted. By default, there is no connect timeout,
465 other than that implemented by system libraries.
466
467 --read-timeout=seconds
468 Set the read (and write) timeout to seconds seconds.
469 The "time" of this timeout refers to idle time: if,
470 at any point in the download, no data is received
471 for more than the specified number of seconds, read-
472 ing fails and the download is restarted. This
473 option does not directly affect the duration of the
474 entire download.
475
476 Of course, the remote server may choose to terminate
477 the connection sooner than this option requires.
478 The default read timeout is 900 seconds.
479
480 --limit-rate=amount
481 Limit the download speed to amount bytes per second.
482 Amount may be expressed in bytes, kilobytes with the
483 k suffix, or megabytes with the m suffix. For exam-
484 ple, --limit-rate=20k will limit the retrieval rate
485 to 20KB/s. This is useful when, for whatever rea-
486 son, you don't want Wget to consume the entire
487 available bandwidth.
488
489 This option allows the use of decimal numbers, usu-
490 ally in conjunction with power suffixes; for exam-
491 ple, --limit-rate=2.5k is a legal value.
492
493 Note that Wget implements the limiting by sleeping
494 the appropriate amount of time after a network read
495 that took less time than specified by the rate.
496 Eventually this strategy causes the TCP transfer to
497 slow down to approximately the specified rate. How-
498 ever, it may take some time for this balance to be
499 achieved, so don't be surprised if limiting the rate
500 doesn't work well with very small files.
501
502 -w seconds
503 --wait=seconds
504 Wait the specified number of seconds between the
505 retrievals. Use of this option is recommended, as
506 it lightens the server load by making the requests
507 less frequent. Instead of in seconds, the time can
508 be specified in minutes using the "m" suffix, in
509 hours using "h" suffix, or in days using "d" suffix.
510
511 Specifying a large value for this option is useful
512 if the network or the destination host is down, so
513 that Wget can wait long enough to reasonably expect
514 the network error to be fixed before the retry. The
515 waiting interval specified by this function is
516 influenced by "--random-wait", which see.
517
518 --waitretry=seconds
519 If you don't want Wget to wait between every
520 retrieval, but only between retries of failed down-
521 loads, you can use this option. Wget will use lin-
522 ear backoff, waiting 1 second after the first fail-
523 ure on a given file, then waiting 2 seconds after
524 the second failure on that file, up to the maximum
525 number of seconds you specify. Therefore, a value
526 of 10 will actually make Wget wait up to (1 + 2 +
527 ... + 10) = 55 seconds per file.
528
529 Note that this option is turned on by default in the
530 global wgetrc file.
531
532 --random-wait
533 Some web sites may perform log analysis to identify
534 retrieval programs such as Wget by looking for sta-
535 tistically significant similarities in the time
536 between requests. This option causes the time
537 between requests to vary between 0.5 and 1.5 * wait
538 seconds, where wait was specified using the --wait
539 option, in order to mask Wget's presence from such
540 analysis.
541
542 A 2001 article in a publication devoted to develop-
543 ment on a popular consumer platform provided code to
544 perform this analysis on the fly. Its author sug-
545 gested blocking at the class C address level to
546 ensure automated retrieval programs were blocked
547 despite changing DHCP-supplied addresses.
548
549 The --random-wait option was inspired by this ill-
550 advised recommendation to block many unrelated users
551 from a web site due to the actions of one.
552
553 --no-proxy
554 Don't use proxies, even if the appropriate *_proxy
555 environment variable is defined.
556
557 -Q quota
558 --quota=quota
559 Specify download quota for automatic retrievals.
560 The value can be specified in bytes (default), kilo-
561 bytes (with k suffix), or megabytes (with m suffix).
562
563 Note that quota will never affect downloading a sin-
564 gle file. So if you specify wget -Q10k
565 ftp://wuarchive.wustl.edu/ls-lR.gz, all of the
566 ls-lR.gz will be downloaded. The same goes even
567 when several URLs are specified on the command-line.
568 However, quota is respected when retrieving either
569 recursively, or from an input file. Thus you may
570 safely type wget -Q2m -i sites---download will be
571 aborted when the quota is exceeded.
572
573 Setting quota to 0 or to inf unlimits the download
574 quota.
575
576 --no-dns-cache
577 Turn off caching of DNS lookups. Normally, Wget
578 remembers the IP addresses it looked up from DNS so
579 it doesn't have to repeatedly contact the DNS server
580 for the same (typically small) set of hosts it
581 retrieves from. This cache exists in memory only; a
582 new Wget run will contact DNS again.
583
584 However, it has been reported that in some situa-
585 tions it is not desirable to cache host names, even
586 for the duration of a short-running application like
587 Wget. With this option Wget issues a new DNS lookup
588 (more precisely, a new call to "gethostbyname" or
589 "getaddrinfo") each time it makes a new connection.
590 Please note that this option will not affect caching
591 that might be performed by the resolving library or
592 by an external caching layer, such as NSCD.
593
594 If you don't understand exactly what this option
595 does, you probably won't need it.
596
597 --restrict-file-names=mode
598 Change which characters found in remote URLs may
599 show up in local file names generated from those
600 URLs. Characters that are restricted by this option
601 are escaped, i.e. replaced with %HH, where HH is the
602 hexadecimal number that corresponds to the
603 restricted character.
604
605 By default, Wget escapes the characters that are not
606 valid as part of file names on your operating sys-
607 tem, as well as control characters that are typi-
608 cally unprintable. This option is useful for chang-
609 ing these defaults, either because you are download-
610 ing to a non-native partition, or because you want
611 to disable escaping of the control characters.
612
613 When mode is set to "unix", Wget escapes the charac-
614 ter / and the control characters in the ranges 0--31
615 and 128--159. This is the default on Unix-like
616 OS'es.
617
618 When mode is set to "windows", Wget escapes the
619 characters \, |, /, :, ?, ", *, <, >, and the con-
620 trol characters in the ranges 0--31 and 128--159.
621 In addition to this, Wget in Windows mode uses +
622 instead of : to separate host and port in local file
623 names, and uses @ instead of ? to separate the query
624 portion of the file name from the rest. Therefore,
625 a URL that would be saved as
626 www.xemacs.org:4300/search.pl?input=blah in Unix
627 mode would be saved as
628 www.xemacs.org+4300/search.pl@input=blah in Windows
629 mode. This mode is the default on Windows.
630
631 If you append ,nocontrol to the mode, as in
632 unix,nocontrol, escaping of the control characters
633 is also switched off. You can use
634 --restrict-file-names=nocontrol to turn off escaping
635 of control characters without affecting the choice
636 of the OS to use as file name restriction mode.
637
638 -4
639 --inet4-only
640 -6
641 --inet6-only
642 Force connecting to IPv4 or IPv6 addresses. With
643 --inet4-only or -4, Wget will only connect to IPv4
644 hosts, ignoring AAAA records in DNS, and refusing to
645 connect to IPv6 addresses specified in URLs. Con-
646 versely, with --inet6-only or -6, Wget will only
647 connect to IPv6 hosts and ignore A records and IPv4
648 addresses.
649
650 Neither options should be needed normally. By
651 default, an IPv6-aware Wget will use the address
652 family specified by the host's DNS record. If the
653 DNS responds with both IPv4 and IPv6 addresses, Wget
654 will try them in sequence until it finds one it can
655 connect to. (Also see "--prefer-family" option
656 described below.)
657
658 These options can be used to deliberately force the
659 use of IPv4 or IPv6 address families on dual family
660 systems, usually to aid debugging or to deal with
661 broken network configuration. Only one of
662 --inet6-only and --inet4-only may be specified at
663 the same time. Neither option is available in Wget
664 compiled without IPv6 support.
665
666 --prefer-family=IPv4/IPv6/none
667 When given a choice of several addresses, connect to
668 the addresses with specified address family first.
669 IPv4 addresses are preferred by default.
670
671 This avoids spurious errors and connect attempts
672 when accessing hosts that resolve to both IPv6 and
673 IPv4 addresses from IPv4 networks. For example,
674 www.kame.net resolves to
675 2001:200:0:8002:203:47ff:fea5:3085 and to
676 203.178.141.194. When the preferred family is
677 "IPv4", the IPv4 address is used first; when the
678 preferred family is "IPv6", the IPv6 address is used
679 first; if the specified value is "none", the address
680 order returned by DNS is used without change.
681
682 Unlike -4 and -6, this option doesn't inhibit access
683 to any address family, it only changes the order in
684 which the addresses are accessed. Also note that
685 the reordering performed by this option is sta-
686 ble---it doesn't affect order of addresses of the
687 same family. That is, the relative order of all
688 IPv4 addresses and of all IPv6 addresses remains
689 intact in all cases.
690
691 --retry-connrefused
692 Consider "connection refused" a transient error and
693 try again. Normally Wget gives up on a URL when it
694 is unable to connect to the site because failure to
695 connect is taken as a sign that the server is not
696 running at all and that retries would not help.
697 This option is for mirroring unreliable sites whose
698 servers tend to disappear for short periods of time.
699
700 --user=user
701 --password=password
702 Specify the username user and password password for
703 both FTP and HTTP file retrieval. These parameters
704 can be overridden using the --ftp-user and
705 --ftp-password options for FTP connections and the
706 --http-user and --http-password options for HTTP
707 connections.
708
709 Directory Options
710
711
712 -nd
713 --no-directories
714 Do not create a hierarchy of directories when
715 retrieving recursively. With this option turned on,
716 all files will get saved to the current directory,
717 without clobbering (if a name shows up more than
718 once, the filenames will get extensions .n).
719
720 -x
721 --force-directories
722 The opposite of -nd---create a hierarchy of directo-
723 ries, even if one would not have been created other-
724 wise. E.g. wget -x http://fly.srk.fer.hr/robots.txt
725 will save the downloaded file to fly.srk.fer.hr/ro-
726 bots.txt.
727
728 -nH
729 --no-host-directories
730 Disable generation of host-prefixed directories. By
731 default, invoking Wget with -r
732 http://fly.srk.fer.hr/ will create a structure of
733 directories beginning with fly.srk.fer.hr/. This
734 option disables such behavior.
735
736 --protocol-directories
737 Use the protocol name as a directory component of
738 local file names. For example, with this option,
739 wget -r http://host will save to http/host/...
740 rather than just to host/....
741
742 --cut-dirs=number
743 Ignore number directory components. This is useful
744 for getting a fine-grained control over the direc-
745 tory where recursive retrieval will be saved.
746
747 Take, for example, the directory at
748 ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve
749 it with -r, it will be saved locally under
750 ftp.xemacs.org/pub/xemacs/. While the -nH option
751 can remove the ftp.xemacs.org/ part, you are still
752 stuck with pub/xemacs. This is where --cut-dirs
753 comes in handy; it makes Wget not "see" number
754 remote directory components. Here are several exam-
755 ples of how --cut-dirs option works.
756
757 No options -> ftp.xemacs.org/pub/xemacs/
758 -nH -> pub/xemacs/
759 -nH --cut-dirs=1 -> xemacs/
760 -nH --cut-dirs=2 -> .
761
762 --cut-dirs=1 -> ftp.xemacs.org/xemacs/
763 ...
764
765 If you just want to get rid of the directory struc-
766 ture, this option is similar to a combination of -nd
767 and -P. However, unlike -nd, --cut-dirs does not
768 lose with subdirectories---for instance, with -nH
769 --cut-dirs=1, a beta/ subdirectory will be placed to
770 xemacs/beta, as one would expect.
771
772 -P prefix
773 --directory-prefix=prefix
774 Set directory prefix to prefix. The directory pre-
775 fix is the directory where all other files and sub-
776 directories will be saved to, i.e. the top of the
777 retrieval tree. The default is . (the current
778 directory).
779
780 HTTP Options
781
782
783 -E
784 --html-extension
785 If a file of type application/xhtml+xml or text/html
786 is downloaded and the URL does not end with the reg-
787 exp \.[Hh][Tt][Mm][Ll]?, this option will cause the
788 suffix .html to be appended to the local filename.
789 This is useful, for instance, when you're mirroring
790 a remote site that uses .asp pages, but you want the
791 mirrored pages to be viewable on your stock Apache
792 server. Another good use for this is when you're
793 downloading CGI-generated materials. A URL like
794 http://site.com/article.cgi?25 will be saved as
795 article.cgi?25.html.
796
797 Note that filenames changed in this way will be re-
798 downloaded every time you re-mirror a site, because
799 Wget can't tell that the local X.html file corre-
800 sponds to remote URL X (since it doesn't yet know
801 that the URL produces output of type text/html or
802 application/xhtml+xml. To prevent this re-download-
803 ing, you must use -k and -K so that the original
804 version of the file will be saved as X.orig.
805
806 --http-user=user
807 --http-password=password
808 Specify the username user and password password on
809 an HTTP server. According to the type of the chal-
810 lenge, Wget will encode them using either the
811 "basic" (insecure), the "digest", or the Windows
812 "NTLM" authentication scheme.
813
814 Another way to specify username and password is in
815 the URL itself. Either method reveals your password
816 to anyone who bothers to run "ps". To prevent the
817 passwords from being seen, store them in .wgetrc or
818 .netrc, and make sure to protect those files from
819 other users with "chmod". If the passwords are
820 really important, do not leave them lying in those
821 files either---edit the files and delete them after
822 Wget has started the download.
823
824 --no-cache
825 Disable server-side cache. In this case, Wget will
826 send the remote server an appropriate directive
827 (Pragma: no-cache) to get the file from the remote
828 service, rather than returning the cached version.
829 This is especially useful for retrieving and flush-
830 ing out-of-date documents on proxy servers.
831
832 Caching is allowed by default.
833
834 --no-cookies
835 Disable the use of cookies. Cookies are a mechanism
836 for maintaining server-side state. The server sends
837 the client a cookie using the "Set-Cookie" header,
838 and the client responds with the same cookie upon
839 further requests. Since cookies allow the server
840 owners to keep track of visitors and for sites to
841 exchange this information, some consider them a
842 breach of privacy. The default is to use cookies;
843 however, storing cookies is not on by default.
844
845 --load-cookies file
846 Load cookies from file before the first HTTP
847 retrieval. file is a textual file in the format
848 originally used by Netscape's cookies.txt file.
849
850 You will typically use this option when mirroring
851 sites that require that you be logged in to access
852 some or all of their content. The login process
853 typically works by the web server issuing an HTTP
854 cookie upon receiving and verifying your creden-
855 tials. The cookie is then resent by the browser
856 when accessing that part of the site, and so proves
857 your identity.
858
859 Mirroring such a site requires Wget to send the same
860 cookies your browser sends when communicating with
861 the site. This is achieved by --load-cookies---sim-
862 ply point Wget to the location of the cookies.txt
863 file, and it will send the same cookies your browser
864 would send in the same situation. Different
865 browsers keep textual cookie files in different
866 locations:
867
868 @asis<Netscape 4.x.>
869 The cookies are in ~/.netscape/cookies.txt.
870
871 @asis<Mozilla and Netscape 6.x.>
872 Mozilla's cookie file is also named cookies.txt,
873 located somewhere under ~/.mozilla, in the
874 directory of your profile. The full path usu-
875 ally ends up looking somewhat like
876 ~/.mozilla/default/some-weird-string/cook-
877 ies.txt.
878
879 @asis<Internet Explorer.>
880 You can produce a cookie file Wget can use by
881 using the File menu, Import and Export, Export
882 Cookies. This has been tested with Internet
883 Explorer 5; it is not guaranteed to work with
884 earlier versions.
885
886 @asis<Other browsers.>
887 If you are using a different browser to create
888 your cookies, --load-cookies will only work if
889 you can locate or produce a cookie file in the
890 Netscape format that Wget expects.
891
892 If you cannot use --load-cookies, there might still
893 be an alternative. If your browser supports a
894 "cookie manager", you can use it to view the cookies
895 used when accessing the site you're mirroring.
896 Write down the name and value of the cookie, and
897 manually instruct Wget to send those cookies,
898 bypassing the "official" cookie support:
899
900 wget --no-cookies --header "Cookie: <name>=<value>"
901
902 --save-cookies file
903 Save cookies to file before exiting. This will not
904 save cookies that have expired or that have no
905 expiry time (so-called "session cookies"), but also
906 see --keep-session-cookies.
907
908 --keep-session-cookies
909 When specified, causes --save-cookies to also save
910 session cookies. Session cookies are normally not
911 saved because they are meant to be kept in memory
912 and forgotten when you exit the browser. Saving
913 them is useful on sites that require you to log in
914 or to visit the home page before you can access some
915 pages. With this option, multiple Wget runs are
916 considered a single browser session as far as the
917 site is concerned.
918
919 Since the cookie file format does not normally carry
920 session cookies, Wget marks them with an expiry
921 timestamp of 0. Wget's --load-cookies recognizes
922 those as session cookies, but it might confuse other
923 browsers. Also note that cookies so loaded will be
924 treated as other session cookies, which means that
925 if you want --save-cookies to preserve them again,
926 you must use --keep-session-cookies again.
927
928 --ignore-length
929 Unfortunately, some HTTP servers (CGI programs, to
930 be more precise) send out bogus "Content-Length"
931 headers, which makes Wget go wild, as it thinks not
932 all the document was retrieved. You can spot this
933 syndrome if Wget retries getting the same document
934 again and again, each time claiming that the (other-
935 wise normal) connection has closed on the very same
936 byte.
937
938 With this option, Wget will ignore the "Con-
939 tent-Length" header---as if it never existed.
940
941 --header=header-line
942 Send header-line along with the rest of the headers
943 in each HTTP request. The supplied header is sent
944 as-is, which means it must contain name and value
945 separated by colon, and must not contain newlines.
946
947 You may define more than one additional header by
948 specifying --header more than once.
949
950 wget --header='Accept-Charset: iso-8859-2' \
951 --header='Accept-Language: hr' \
952 http://fly.srk.fer.hr/
953
954 Specification of an empty string as the header value
955 will clear all previous user-defined headers.
956
957 As of Wget 1.10, this option can be used to override
958 headers otherwise generated automatically. This
959 example instructs Wget to connect to localhost, but
960 to specify foo.bar in the "Host" header:
961
962 wget --header="Host: foo.bar" http://localhost/
963
964 In versions of Wget prior to 1.10 such use of
965 --header caused sending of duplicate headers.
966
967 --max-redirect=number
968 Specifies the maximum number of redirections to fol-
969 low for a resource. The default is 20, which is
970 usually far more than necessary. However, on those
971 occasions where you want to allow more (or fewer),
972 this is the option to use.
973
974 --proxy-user=user
975 --proxy-password=password
976 Specify the username user and password password for
977 authentication on a proxy server. Wget will encode
978 them using the "basic" authentication scheme.
979
980 Security considerations similar to those with
981 --http-password pertain here as well.
982
983 --referer=url
984 Include `Referer: url' header in HTTP request. Use-
985 ful for retrieving documents with server-side pro-
986 cessing that assume they are always being retrieved
987 by interactive web browsers and only come out prop-
988 erly when Referer is set to one of the pages that
989 point to them.
990
991 --save-headers
992 Save the headers sent by the HTTP server to the
993 file, preceding the actual contents, with an empty
994 line as the separator.
995
996 -U agent-string
997 --user-agent=agent-string
998 Identify as agent-string to the HTTP server.
999
1000 The HTTP protocol allows the clients to identify
1001 themselves using a "User-Agent" header field. This
1002 enables distinguishing the WWW software, usually for
1003 statistical purposes or for tracing of protocol vio-
1004 lations. Wget normally identifies as Wget/version,
1005 version being the current version number of Wget.
1006
1007 However, some sites have been known to impose the
1008 policy of tailoring the output according to the
1009 "User-Agent"-supplied information. While this is
1010 not such a bad idea in theory, it has been abused by
1011 servers denying information to clients other than
1012 (historically) Netscape or, more frequently,
1013 Microsoft Internet Explorer. This option allows you
1014 to change the "User-Agent" line issued by Wget. Use
1015 of this option is discouraged, unless you really
1016 know what you are doing.
1017
1018 Specifying empty user agent with --user-agent=""
1019 instructs Wget not to send the "User-Agent" header
1020 in HTTP requests.
1021
1022 --post-data=string
1023 --post-file=file
1024 Use POST as the method for all HTTP requests and
1025 send the specified data in the request body.
1026 "--post-data" sends string as data, whereas
1027 "--post-file" sends the contents of file. Other
1028 than that, they work in exactly the same way.
1029
1030 Please be aware that Wget needs to know the size of
1031 the POST data in advance. Therefore the argument to
1032 "--post-file" must be a regular file; specifying a
1033 FIFO or something like /dev/stdin won't work. It's
1034 not quite clear how to work around this limitation
1035 inherent in HTTP/1.0. Although HTTP/1.1 introduces
1036 chunked transfer that doesn't require knowing the
1037 request length in advance, a client can't use chun-
1038 ked unless it knows it's talking to an HTTP/1.1
1039 server. And it can't know that until it receives a
1040 response, which in turn requires the request to have
1041 been completed -- a chicken-and-egg problem.
1042
1043 Note: if Wget is redirected after the POST request
1044 is completed, it will not send the POST data to the
1045 redirected URL. This is because URLs that process
1046 POST often respond with a redirection to a regular
1047 page, which does not desire or accept POST. It is
1048 not completely clear that this behavior is optimal;
1049 if it doesn't work out, it might be changed in the
1050 future.
1051
1052 This example shows how to log to a server using POST
1053 and then proceed to download the desired pages, pre-
1054 sumably only accessible to authorized users:
1055
1056 # Log in to the server. This can be done only once.
1057 wget --save-cookies cookies.txt \
1058 --post-data 'user=foo&password=bar' \
1059 http://server.com/auth.php
1060
1061 # Now grab the page or pages we care about.
1062 wget --load-cookies cookies.txt \
1063 -p http://server.com/interesting/article.php
1064
1065 If the server is using session cookies to track user
1066 authentication, the above will not work because
1067 --save-cookies will not save them (and neither will
1068 browsers) and the cookies.txt file will be empty.
1069 In that case use --keep-session-cookies along with
1070 --save-cookies to force saving of session cookies.
1071
1072 --content-disposition
1073 If this is set to on, experimental (not fully-func-
1074 tional) support for "Content-Disposition" headers is
1075 enabled. This can currently result in extra round-
1076 trips to the server for a "HEAD" request, and is
1077 known to suffer from a few bugs, which is why it is
1078 not currently enabled by default.
1079
1080 This option is useful for some file-downloading CGI
1081 programs that use "Content-Disposition" headers to
1082 describe what the name of a downloaded file should
1083 be.
1084
1085 --auth-no-challenge
1086 If this option is given, Wget will send Basic HTTP
1087 authentication information (plaintext username and
1088 password) for all requests, just like Wget 1.10.2
1089 and prior did by default.
1090
1091 Use of this option is not recommended, and is
1092 intended only to support some few obscure servers,
1093 which never send HTTP authentication challenges, but
1094 accept unsolicited auth info, say, in addition to
1095 form-based authentication.
1096
1097 HTTPS (SSL/TLS) Options
1098
1099 To support encrypted HTTP (HTTPS) downloads, Wget must
1100 be compiled with an external SSL library, currently
1101 OpenSSL. If Wget is compiled without SSL support, none
1102 of these options are available.
1103
1104 --secure-protocol=protocol
1105 Choose the secure protocol to be used. Legal values
1106 are auto, SSLv2, SSLv3, and TLSv1. If auto is used,
1107 the SSL library is given the liberty of choosing the
1108 appropriate protocol automatically, which is
1109 achieved by sending an SSLv2 greeting and announcing
1110 support for SSLv3 and TLSv1. This is the default.
1111
1112 Specifying SSLv2, SSLv3, or TLSv1 forces the use of
1113 the corresponding protocol. This is useful when
1114 talking to old and buggy SSL server implementations
1115 that make it hard for OpenSSL to choose the correct
1116 protocol version. Fortunately, such servers are
1117 quite rare.
1118
1119 --no-check-certificate
1120 Don't check the server certificate against the
1121 available certificate authorities. Also don't
1122 require the URL host name to match the common name
1123 presented by the certificate.
1124
1125 As of Wget 1.10, the default is to verify the
1126 server's certificate against the recognized certifi-
1127 cate authorities, breaking the SSL handshake and
1128 aborting the download if the verification fails.
1129 Although this provides more secure downloads, it
1130 does break interoperability with some sites that
1131 worked with previous Wget versions, particularly
1132 those using self-signed, expired, or otherwise
1133 invalid certificates. This option forces an "inse-
1134 cure" mode of operation that turns the certificate
1135 verification errors into warnings and allows you to
1136 proceed.
1137
1138 If you encounter "certificate verification" errors
1139 or ones saying that "common name doesn't match
1140 requested host name", you can use this option to
1141 bypass the verification and proceed with the down-
1142 load. Only use this option if you are otherwise
1143 convinced of the site's authenticity, or if you
1144 really don't care about the validity of its certifi-
1145 cate. It is almost always a bad idea not to check
1146 the certificates when transmitting confidential or
1147 important data.
1148
1149 --certificate=file
1150 Use the client certificate stored in file. This is
1151 needed for servers that are configured to require
1152 certificates from the clients that connect to them.
1153 Normally a certificate is not required and this
1154 switch is optional.
1155
1156 --certificate-type=type
1157 Specify the type of the client certificate. Legal
1158 values are PEM (assumed by default) and DER, also
1159 known as ASN1.
1160
1161 --private-key=file
1162 Read the private key from file. This allows you to
1163 provide the private key in a file separate from the
1164 certificate.
1165
1166 --private-key-type=type
1167 Specify the type of the private key. Accepted val-
1168 ues are PEM (the default) and DER.
1169
1170 --ca-certificate=file
1171 Use file as the file with the bundle of certificate
1172 authorities ("CA") to verify the peers. The cer-
1173 tificates must be in PEM format.
1174
1175 Without this option Wget looks for CA certificates
1176 at the system-specified locations, chosen at OpenSSL
1177 installation time.
1178
1179 --ca-directory=directory
1180 Specifies directory containing CA certificates in
1181 PEM format. Each file contains one CA certificate,
1182 and the file name is based on a hash value derived
1183 from the certificate. This is achieved by process-
1184 ing a certificate directory with the "c_rehash"
1185 utility supplied with OpenSSL. Using --ca-directory
1186 is more efficient than --ca-certificate when many
1187 certificates are installed because it allows Wget to
1188 fetch certificates on demand.
1189
1190 Without this option Wget looks for CA certificates
1191 at the system-specified locations, chosen at OpenSSL
1192 installation time.
1193
1194 --random-file=file
1195 Use file as the source of random data for seeding
1196 the pseudo-random number generator on systems with-
1197 out /dev/random.
1198
1199 On such systems the SSL library needs an external
1200 source of randomness to initialize. Randomness may
1201 be provided by EGD (see --egd-file below) or read
1202 from an external source specified by the user. If
1203 this option is not specified, Wget looks for random
1204 data in $RANDFILE or, if that is unset, in
1205 $HOME/.rnd. If none of those are available, it is
1206 likely that SSL encryption will not be usable.
1207
1208 If you're getting the "Could not seed OpenSSL PRNG;
1209 disabling SSL." error, you should provide random
1210 data using some of the methods described above.
1211
1212 --egd-file=file
1213 Use file as the EGD socket. EGD stands for Entropy
1214 Gathering Daemon, a user-space program that collects
1215 data from various unpredictable system sources and
1216 makes it available to other programs that might need
1217 it. Encryption software, such as the SSL library,
1218 needs sources of non-repeating randomness to seed
1219 the random number generator used to produce crypto-
1220 graphically strong keys.
1221
1222 OpenSSL allows the user to specify his own source of
1223 entropy using the "RAND_FILE" environment variable.
1224 If this variable is unset, or if the specified file
1225 does not produce enough randomness, OpenSSL will
1226 read random data from EGD socket specified using
1227 this option.
1228
1229 If this option is not specified (and the equivalent
1230 startup command is not used), EGD is never con-
1231 tacted. EGD is not needed on modern Unix systems
1232 that support /dev/random.
1233
1234 FTP Options
1235
1236
1237 --ftp-user=user
1238 --ftp-password=password
1239 Specify the username user and password password on
1240 an FTP server. Without this, or the corresponding
1241 startup option, the password defaults to -wget@,
1242 normally used for anonymous FTP.
1243
1244 Another way to specify username and password is in
1245 the URL itself. Either method reveals your password
1246 to anyone who bothers to run "ps". To prevent the
1247 passwords from being seen, store them in .wgetrc or
1248 .netrc, and make sure to protect those files from
1249 other users with "chmod". If the passwords are
1250 really important, do not leave them lying in those
1251 files either---edit the files and delete them after
1252 Wget has started the download.
1253
1254 --no-remove-listing
1255 Don't remove the temporary .listing files generated
1256 by FTP retrievals. Normally, these files contain
1257 the raw directory listings received from FTP
1258 servers. Not removing them can be useful for debug-
1259 ging purposes, or when you want to be able to easily
1260 check on the contents of remote server directories
1261 (e.g. to verify that a mirror you're running is com-
1262 plete).
1263
1264 Note that even though Wget writes to a known file-
1265 name for this file, this is not a security hole in
1266 the scenario of a user making .listing a symbolic
1267 link to /etc/passwd or something and asking "root"
1268 to run Wget in his or her directory. Depending on
1269 the options used, either Wget will refuse to write
1270 to .listing, making the globbing/recur-
1271 sion/time-stamping operation fail, or the symbolic
1272 link will be deleted and replaced with the actual
1273 .listing file, or the listing will be written to a
1274 .listing.number file.
1275
1276 Even though this situation isn't a problem, though,
1277 "root" should never run Wget in a non-trusted user's
1278 directory. A user could do something as simple as
1279 linking index.html to /etc/passwd and asking "root"
1280 to run Wget with -N or -r so the file will be over-
1281 written.
1282
1283 --no-glob
1284 Turn off FTP globbing. Globbing refers to the use
1285 of shell-like special characters (wildcards), like
1286 *, ?, [ and ] to retrieve more than one file from
1287 the same directory at once, like:
1288
1289 wget ftp://gnjilux.srk.fer.hr/*.msg
1290
1291 By default, globbing will be turned on if the URL
1292 contains a globbing character. This option may be
1293 used to turn globbing on or off permanently.
1294
1295 You may have to quote the URL to protect it from
1296 being expanded by your shell. Globbing makes Wget
1297 look for a directory listing, which is system-spe-
1298 cific. This is why it currently works only with
1299 Unix FTP servers (and the ones emulating Unix "ls"
1300 output).
1301
1302 --no-passive-ftp
1303 Disable the use of the passive FTP transfer mode.
1304 Passive FTP mandates that the client connect to the
1305 server to establish the data connection rather than
1306 the other way around.
1307
1308 If the machine is connected to the Internet
1309 directly, both passive and active FTP should work
1310 equally well. Behind most firewall and NAT configu-
1311 rations passive FTP has a better chance of working.
1312 However, in some rare firewall configurations,
1313 active FTP actually works when passive FTP doesn't.
1314 If you suspect this to be the case, use this option,
1315 or set "passive_ftp=off" in your init file.
1316
1317 --retr-symlinks
1318 Usually, when retrieving FTP directories recursively
1319 and a symbolic link is encountered, the linked-to
1320 file is not downloaded. Instead, a matching sym-
1321 bolic link is created on the local filesystem. The
1322 pointed-to file will not be downloaded unless this
1323 recursive retrieval would have encountered it sepa-
1324 rately and downloaded it anyway.
1325
1326 When --retr-symlinks is specified, however, symbolic
1327 links are traversed and the pointed-to files are
1328 retrieved. At this time, this option does not cause
1329 Wget to traverse symlinks to directories and recurse
1330 through them, but in the future it should be
1331 enhanced to do this.
1332
1333 Note that when retrieving a file (not a directory)
1334 because it was specified on the command-line, rather
1335 than because it was recursed to, this option has no
1336 effect. Symbolic links are always traversed in this
1337 case.
1338
1339 --no-http-keep-alive
1340 Turn off the "keep-alive" feature for HTTP down-
1341 loads. Normally, Wget asks the server to keep the
1342 connection open so that, when you download more than
1343 one document from the same server, they get trans-
1344 ferred over the same TCP connection. This saves
1345 time and at the same time reduces the load on the
1346 server.
1347
1348 This option is useful when, for some reason, persis-
1349 tent (keep-alive) connections don't work for you,
1350 for example due to a server bug or due to the
1351 inability of server-side scripts to cope with the
1352 connections.
1353
1354 Recursive Retrieval Options
1355
1356
1357 -r
1358 --recursive
1359 Turn on recursive retrieving.
1360
1361 -l depth
1362 --level=depth
1363 Specify recursion maximum depth level depth. The
1364 default maximum depth is 5.
1365
1366 --delete-after
1367 This option tells Wget to delete every single file
1368 it downloads, after having done so. It is useful
1369 for pre-fetching popular pages through a proxy,
1370 e.g.:
1371
1372 wget -r -nd --delete-after http://whatever.com/~popular/page/
1373
1374 The -r option is to retrieve recursively, and -nd to
1375 not create directories.
1376
1377 Note that --delete-after deletes files on the local
1378 machine. It does not issue the DELE command to
1379 remote FTP sites, for instance. Also note that when
1380 --delete-after is specified, --convert-links is
1381 ignored, so .orig files are simply not created in
1382 the first place.
1383
1384 -k
1385 --convert-links
1386 After the download is complete, convert the links in
1387 the document to make them suitable for local view-
1388 ing. This affects not only the visible hyperlinks,
1389 but any part of the document that links to external
1390 content, such as embedded images, links to style
1391 sheets, hyperlinks to non-HTML content, etc.
1392
1393 Each link will be changed in one of the two ways:
1394
1395 * The links to files that have been downloaded by
1396 Wget will be changed to refer to the file they
1397 point to as a relative link.
1398
1399 Example: if the downloaded file /foo/doc.html
1400 links to /bar/img.gif, also downloaded, then the
1401 link in doc.html will be modified to point to
1402 ../bar/img.gif. This kind of transformation
1403 works reliably for arbitrary combinations of
1404 directories.
1405
1406 * The links to files that have not been downloaded
1407 by Wget will be changed to include host name and
1408 absolute path of the location they point to.
1409
1410 Example: if the downloaded file /foo/doc.html
1411 links to /bar/img.gif (or to ../bar/img.gif),
1412 then the link in doc.html will be modified to
1413 point to http://hostname/bar/img.gif.
1414
1415 Because of this, local browsing works reliably: if a
1416 linked file was downloaded, the link will refer to
1417 its local name; if it was not downloaded, the link
1418 will refer to its full Internet address rather than
1419 presenting a broken link. The fact that the former
1420 links are converted to relative links ensures that
1421 you can move the downloaded hierarchy to another
1422 directory.
1423
1424 Note that only at the end of the download can Wget
1425 know which links have been downloaded. Because of
1426 that, the work done by -k will be performed at the
1427 end of all the downloads.
1428
1429 -K
1430 --backup-converted
1431 When converting a file, back up the original version
1432 with a .orig suffix. Affects the behavior of -N.
1433
1434 -m
1435 --mirror
1436 Turn on options suitable for mirroring. This option
1437 turns on recursion and time-stamping, sets infinite
1438 recursion depth and keeps FTP directory listings.
1439 It is currently equivalent to -r -N -l inf
1440 --no-remove-listing.
1441
1442 -p
1443 --page-requisites
1444 This option causes Wget to download all the files
1445 that are necessary to properly display a given HTML
1446 page. This includes such things as inlined images,
1447 sounds, and referenced stylesheets.
1448
1449 Ordinarily, when downloading a single HTML page, any
1450 requisite documents that may be needed to display it
1451 properly are not downloaded. Using -r together with
1452 -l can help, but since Wget does not ordinarily dis-
1453 tinguish between external and inlined documents, one
1454 is generally left with "leaf documents" that are
1455 missing their requisites.
1456
1457 For instance, say document 1.html contains an
1458 "<IMG>" tag referencing 1.gif and an "<A>" tag
1459 pointing to external document 2.html. Say that
1460 2.html is similar but that its image is 2.gif and it
1461 links to 3.html. Say this continues up to some
1462 arbitrarily high number.
1463
1464 If one executes the command:
1465
1466 wget -r -l 2 http://<site>/1.html
1467
1468 then 1.html, 1.gif, 2.html, 2.gif, and 3.html will
1469 be downloaded. As you can see, 3.html is without
1470 its requisite 3.gif because Wget is simply counting
1471 the number of hops (up to 2) away from 1.html in
1472 order to determine where to stop the recursion.
1473 However, with this command:
1474
1475 wget -r -l 2 -p http://<site>/1.html
1476
1477 all the above files and 3.html's requisite 3.gif
1478 will be downloaded. Similarly,
1479
1480 wget -r -l 1 -p http://<site>/1.html
1481
1482 will cause 1.html, 1.gif, 2.html, and 2.gif to be
1483 downloaded. One might think that:
1484
1485 wget -r -l 0 -p http://<site>/1.html
1486
1487 would download just 1.html and 1.gif, but unfortu-
1488 nately this is not the case, because -l 0 is equiva-
1489 lent to -l inf---that is, infinite recursion. To
1490 download a single HTML page (or a handful of them,
1491 all specified on the command-line or in a -i URL
1492 input file) and its (or their) requisites, simply
1493 leave off -r and -l:
1494
1495 wget -p http://<site>/1.html
1496
1497 Note that Wget will behave as if -r had been speci-
1498 fied, but only that single page and its requisites
1499 will be downloaded. Links from that page to exter-
1500 nal documents will not be followed. Actually, to
1501 download a single page and all its requisites (even
1502 if they exist on separate websites), and make sure
1503 the lot displays properly locally, this author likes
1504 to use a few options in addition to -p:
1505
1506 wget -E -H -k -K -p http://<site>/<document>
1507
1508 To finish off this topic, it's worth knowing that
1509 Wget's idea of an external document link is any URL
1510 specified in an "<A>" tag, an "<AREA>" tag, or a
1511 "<LINK>" tag other than "<LINK REL="stylesheet">".
1512
1513 --strict-comments
1514 Turn on strict parsing of HTML comments. The
1515 default is to terminate comments at the first occur-
1516 rence of -->.
1517
1518 According to specifications, HTML comments are
1519 expressed as SGML declarations. Declaration is spe-
1520 cial markup that begins with <! and ends with >,
1521 such as <!DOCTYPE ...>, that may contain comments
1522 between a pair of -- delimiters. HTML comments are
1523 "empty declarations", SGML declarations without any
1524 non-comment text. Therefore, <!--foo--> is a valid
1525 comment, and so is <!--one-- --two-->, but
1526 <!--1--2--> is not.
1527
1528 On the other hand, most HTML writers don't perceive
1529 comments as anything other than text delimited with
1530 <!-- and -->, which is not quite the same. For
1531 example, something like <!------------> works as a
1532 valid comment as long as the number of dashes is a
1533 multiple of four (!). If not, the comment techni-
1534 cally lasts until the next --, which may be at the
1535 other end of the document. Because of this, many
1536 popular browsers completely ignore the specification
1537 and implement what users have come to expect: com-
1538 ments delimited with <!-- and -->.
1539
1540 Until version 1.9, Wget interpreted comments
1541 strictly, which resulted in missing links in many
1542 web pages that displayed fine in browsers, but had
1543 the misfortune of containing non-compliant comments.
1544 Beginning with version 1.9, Wget has joined the
1545 ranks of clients that implements "naive" comments,
1546 terminating each comment at the first occurrence of
1547 -->.
1548
1549 If, for whatever reason, you want strict comment
1550 parsing, use this option to turn it on.
1551
1552 Recursive Accept/Reject Options
1553
1554
1555 -A acclist --accept acclist
1556 -R rejlist --reject rejlist
1557 Specify comma-separated lists of file name suffixes
1558 or patterns to accept or reject. Note that if any of
1559 the wildcard characters, *, ?, [ or ], appear in an
1560 element of acclist or rejlist, it will be treated as
1561 a pattern, rather than a suffix.
1562
1563 -D domain-list
1564 --domains=domain-list
1565 Set domains to be followed. domain-list is a comma-
1566 separated list of domains. Note that it does not
1567 turn on -H.
1568
1569 --exclude-domains domain-list
1570 Specify the domains that are not to be followed..
1571
1572 --follow-ftp
1573 Follow FTP links from HTML documents. Without this
1574 option, Wget will ignore all the FTP links.
1575
1576 --follow-tags=list
1577 Wget has an internal table of HTML tag / attribute
1578 pairs that it considers when looking for linked doc-
1579 uments during a recursive retrieval. If a user
1580 wants only a subset of those tags to be considered,
1581 however, he or she should be specify such tags in a
1582 comma-separated list with this option.
1583
1584 --ignore-tags=list
1585 This is the opposite of the --follow-tags option.
1586 To skip certain HTML tags when recursively looking
1587 for documents to download, specify them in a comma-
1588 separated list.
1589
1590 In the past, this option was the best bet for down-
1591 loading a single page and its requisites, using a
1592 command-line like:
1593
1594 wget --ignore-tags=a,area -H -k -K -r http://<site>/<document>
1595
1596 However, the author of this option came across a
1597 page with tags like "<LINK REL="home" HREF="/">" and
1598 came to the realization that specifying tags to
1599 ignore was not enough. One can't just tell Wget to
1600 ignore "<LINK>", because then stylesheets will not
1601 be downloaded. Now the best bet for downloading a
1602 single page and its requisites is the dedicated
1603 --page-requisites option.
1604
1605 --ignore-case
1606 Ignore case when matching files and directories.
1607 This influences the behavior of -R, -A, -I, and -X
1608 options, as well as globbing implemented when down-
1609 loading from FTP sites. For example, with this
1610 option, -A *.txt will match file1.txt, but also
1611 file2.TXT, file3.TxT, and so on.
1612
1613 -H
1614 --span-hosts
1615 Enable spanning across hosts when doing recursive
1616 retrieving.
1617
1618 -L
1619 --relative
1620 Follow relative links only. Useful for retrieving a
1621 specific home page without any distractions, not
1622 even those from the same hosts.
1623
1624 -I list
1625 --include-directories=list
1626 Specify a comma-separated list of directories you
1627 wish to follow when downloading. Elements of list
1628 may contain wildcards.
1629
1630 -X list
1631 --exclude-directories=list
1632 Specify a comma-separated list of directories you
1633 wish to exclude from download. Elements of list may
1634 contain wildcards.
1635
1636 -np
1637 --no-parent
1638 Do not ever ascend to the parent directory when
1639 retrieving recursively. This is a useful option,
1640 since it guarantees that only the files below a cer-
1641 tain hierarchy will be downloaded.
1642
1643 FILES
1644 /usr/local/etc/wgetrc
1645 Default location of the global startup file.
1646
1647 .wgetrc
1648 User startup file.
1649
1650 BUGS
1651 You are welcome to submit bug reports via the GNU Wget
1652 bug tracker (see <http://wget.addictivecode.org/Bug-
1653 Tracker>).
1654
1655 Before actually submitting a bug report, please try to
1656 follow a few simple guidelines.
1657
1658 1. Please try to ascertain that the behavior you see
1659 really is a bug. If Wget crashes, it's a bug. If
1660 Wget does not behave as documented, it's a bug. If
1661 things work strange, but you are not sure about the
1662 way they are supposed to work, it might well be a
1663 bug, but you might want to double-check the documen-
1664 tation and the mailing lists.
1665
1666 2. Try to repeat the bug in as simple circumstances as
1667 possible. E.g. if Wget crashes while downloading
1668 wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o
1669 /tmp/log, you should try to see if the crash is
1670 repeatable, and if will occur with a simpler set of
1671 options. You might even try to start the download
1672 at the page where the crash occurred to see if that
1673 page somehow triggered the crash.
1674
1675 Also, while I will probably be interested to know
1676 the contents of your .wgetrc file, just dumping it
1677 into the debug message is probably a bad idea.
1678 Instead, you should first try to see if the bug
1679 repeats with .wgetrc moved out of the way. Only if
1680 it turns out that .wgetrc settings affect the bug,
1681 mail me the relevant parts of the file.
1682
1683 3. Please start Wget with -d option and send us the
1684 resulting output (or relevant parts thereof). If
1685 Wget was compiled without debug support, recompile
1686 it---it is much easier to trace bugs with debug sup-
1687 port on.
1688
1689 Note: please make sure to remove any potentially
1690 sensitive information from the debug log before
1691 sending it to the bug address. The "-d" won't go
1692 out of its way to collect sensitive information, but
1693 the log will contain a fairly complete transcript of
1694 Wget's communication with the server, which may
1695 include passwords and pieces of downloaded data.
1696 Since the bug address is publically archived, you
1697 may assume that all bug reports are visible to the
1698 public.
1699
1700 4. If Wget has crashed, try to run it in a debugger,
1701 e.g. "gdb `which wget` core" and type "where" to get
1702 the backtrace. This may not work if the system
1703 administrator has disabled core files, but it is
1704 safe to try.
1705
1706 SEE ALSO
1707 This is not the complete manual for GNU Wget. For more
1708 complete information, including more detailed explana-
1709 tions of some of the options, and a number of commands
1710 available for use with .wgetrc files and the -e option,
1711 see the GNU Info entry for wget.
1712
1713 AUTHOR
1714 Originally written by Hrvoje Niksic
1715 <hniksic@xemacs.org>. Currently maintained by Micah
1716 Cowan <micah@cowan.name>.
1717
1718 COPYRIGHT
1719 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,
1720 2003, 2004, 2005, 2006, 2007, 2008 Free Software Founda-
1721 tion, Inc.
1722
1723 Permission is granted to copy, distribute and/or modify
1724 this document under the terms of the GNU Free Documenta-
1725 tion License, Version 1.2 or any later version published
1726 by the Free Software Foundation; with no Invariant Sec-
1727 tions, no Front-Cover Texts, and no Back-Cover Texts. A
1728 copy of the license is included in the section entitled
1729 "GNU Free Documentation License".
1730
1731
1732
1733 GNU Wget 1.11.4 2008-06-29 WGET(1)