Mercurial > hg > camir-aes2014
comparison toolboxes/wget/man/cat1/wget.1.txt @ 0:e9a9cd732c1e tip
first hg version after svn
author | wolffd |
---|---|
date | Tue, 10 Feb 2015 15:05:51 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e9a9cd732c1e |
---|---|
1 WGET(1) GNU Wget WGET(1) | |
2 | |
3 | |
4 | |
5 NAME | |
6 Wget - The non-interactive network downloader. | |
7 | |
8 SYNOPSIS | |
9 wget [option]... [URL]... | |
10 | |
11 DESCRIPTION | |
12 GNU Wget is a free utility for non-interactive download | |
13 of files from the Web. It supports HTTP, HTTPS, and FTP | |
14 protocols, as well as retrieval through HTTP proxies. | |
15 | |
16 Wget is non-interactive, meaning that it can work in the | |
17 background, while the user is not logged on. This | |
18 allows you to start a retrieval and disconnect from the | |
19 system, letting Wget finish the work. By contrast, most | |
20 of the Web browsers require constant user's presence, | |
21 which can be a great hindrance when transferring a lot | |
22 of data. | |
23 | |
24 Wget can follow links in HTML and XHTML pages and create | |
25 local versions of remote web sites, fully recreating the | |
26 directory structure of the original site. This is some- | |
27 times referred to as "recursive downloading." While | |
28 doing that, Wget respects the Robot Exclusion Standard | |
29 (/robots.txt). Wget can be instructed to convert the | |
30 links in downloaded HTML files to the local files for | |
31 offline viewing. | |
32 | |
33 Wget has been designed for robustness over slow or | |
34 unstable network connections; if a download fails due to | |
35 a network problem, it will keep retrying until the whole | |
36 file has been retrieved. If the server supports reget- | |
37 ting, it will instruct the server to continue the down- | |
38 load from where it left off. | |
39 | |
40 OPTIONS | |
41 Option Syntax | |
42 | |
43 Since Wget uses GNU getopt to process command-line argu- | |
44 ments, every option has a long form along with the short | |
45 one. Long options are more convenient to remember, but | |
46 take time to type. You may freely mix different option | |
47 styles, or specify options after the command-line argu- | |
48 ments. Thus you may write: | |
49 | |
50 wget -r --tries=10 http://fly.srk.fer.hr/ -o log | |
51 | |
52 The space between the option accepting an argument and | |
53 the argument may be omitted. Instead of -o log you can | |
54 write -olog. | |
55 | |
56 You may put several options that do not require argu- | |
57 ments together, like: | |
58 | |
59 wget -drc <URL> | |
60 | |
61 This is a complete equivalent of: | |
62 | |
63 wget -d -r -c <URL> | |
64 | |
65 Since the options can be specified after the arguments, | |
66 you may terminate them with --. So the following will | |
67 try to download URL -x, reporting failure to log: | |
68 | |
69 wget -o log -- -x | |
70 | |
71 The options that accept comma-separated lists all | |
72 respect the convention that specifying an empty list | |
73 clears its value. This can be useful to clear the | |
74 .wgetrc settings. For instance, if your .wgetrc sets | |
75 "exclude_directories" to /cgi-bin, the following example | |
76 will first reset it, and then set it to exclude /~nobody | |
77 and /~somebody. You can also clear the lists in | |
78 .wgetrc. | |
79 | |
80 wget -X " -X /~nobody,/~somebody | |
81 | |
82 Most options that do not accept arguments are boolean | |
83 options, so named because their state can be captured | |
84 with a yes-or-no ("boolean") variable. For example, | |
85 --follow-ftp tells Wget to follow FTP links from HTML | |
86 files and, on the other hand, --no-glob tells it not to | |
87 perform file globbing on FTP URLs. A boolean option is | |
88 either affirmative or negative (beginning with --no). | |
89 All such options share several properties. | |
90 | |
91 Unless stated otherwise, it is assumed that the default | |
92 behavior is the opposite of what the option accom- | |
93 plishes. For example, the documented existence of | |
94 --follow-ftp assumes that the default is to not follow | |
95 FTP links from HTML pages. | |
96 | |
97 Affirmative options can be negated by prepending the | |
98 --no- to the option name; negative options can be | |
99 negated by omitting the --no- prefix. This might seem | |
100 superfluous---if the default for an affirmative option | |
101 is to not do something, then why provide a way to | |
102 explicitly turn it off? But the startup file may in | |
103 fact change the default. For instance, using "fol- | |
104 low_ftp = off" in .wgetrc makes Wget not follow FTP | |
105 links by default, and using --no-follow-ftp is the only | |
106 way to restore the factory default from the command | |
107 line. | |
108 | |
109 Basic Startup Options | |
110 | |
111 | |
112 -V | |
113 --version | |
114 Display the version of Wget. | |
115 | |
116 -h | |
117 --help | |
118 Print a help message describing all of Wget's com- | |
119 mand-line options. | |
120 | |
121 -b | |
122 --background | |
123 Go to background immediately after startup. If no | |
124 output file is specified via the -o, output is redi- | |
125 rected to wget-log. | |
126 | |
127 -e command | |
128 --execute command | |
129 Execute command as if it were a part of .wgetrc. A | |
130 command thus invoked will be executed after the com- | |
131 mands in .wgetrc, thus taking precedence over them. | |
132 If you need to specify more than one wgetrc command, | |
133 use multiple instances of -e. | |
134 | |
135 Logging and Input File Options | |
136 | |
137 | |
138 -o logfile | |
139 --output-file=logfile | |
140 Log all messages to logfile. The messages are nor- | |
141 mally reported to standard error. | |
142 | |
143 -a logfile | |
144 --append-output=logfile | |
145 Append to logfile. This is the same as -o, only it | |
146 appends to logfile instead of overwriting the old | |
147 log file. If logfile does not exist, a new file is | |
148 created. | |
149 | |
150 -d | |
151 --debug | |
152 Turn on debug output, meaning various information | |
153 important to the developers of Wget if it does not | |
154 work properly. Your system administrator may have | |
155 chosen to compile Wget without debug support, in | |
156 which case -d will not work. Please note that com- | |
157 piling with debug support is always safe---Wget com- | |
158 piled with the debug support will not print any | |
159 debug info unless requested with -d. | |
160 | |
161 -q | |
162 --quiet | |
163 Turn off Wget's output. | |
164 | |
165 -v | |
166 --verbose | |
167 Turn on verbose output, with all the available data. | |
168 The default output is verbose. | |
169 | |
170 -nv | |
171 --no-verbose | |
172 Turn off verbose without being completely quiet (use | |
173 -q for that), which means that error messages and | |
174 basic information still get printed. | |
175 | |
176 -i file | |
177 --input-file=file | |
178 Read URLs from file. If - is specified as file, | |
179 URLs are read from the standard input. (Use ./- to | |
180 read from a file literally named -.) | |
181 | |
182 If this function is used, no URLs need be present on | |
183 the command line. If there are URLs both on the | |
184 command line and in an input file, those on the com- | |
185 mand lines will be the first ones to be retrieved. | |
186 The file need not be an HTML document (but no harm | |
187 if it is)---it is enough if the URLs are just listed | |
188 sequentially. | |
189 | |
190 However, if you specify --force-html, the document | |
191 will be regarded as html. In that case you may have | |
192 problems with relative links, which you can solve | |
193 either by adding "<base href="url">" to the docu- | |
194 ments or by specifying --base=url on the command | |
195 line. | |
196 | |
197 -F | |
198 --force-html | |
199 When input is read from a file, force it to be | |
200 treated as an HTML file. This enables you to | |
201 retrieve relative links from existing HTML files on | |
202 your local disk, by adding "<base href="url">" to | |
203 HTML, or using the --base command-line option. | |
204 | |
205 -B URL | |
206 --base=URL | |
207 Prepends URL to relative links read from the file | |
208 specified with the -i option. | |
209 | |
210 Download Options | |
211 | |
212 | |
213 --bind-address=ADDRESS | |
214 When making client TCP/IP connections, bind to | |
215 ADDRESS on the local machine. ADDRESS may be speci- | |
216 fied as a hostname or IP address. This option can | |
217 be useful if your machine is bound to multiple IPs. | |
218 | |
219 -t number | |
220 --tries=number | |
221 Set number of retries to number. Specify 0 or inf | |
222 for infinite retrying. The default is to retry 20 | |
223 times, with the exception of fatal errors like "con- | |
224 nection refused" or "not found" (404), which are not | |
225 retried. | |
226 | |
227 -O file | |
228 --output-document=file | |
229 The documents will not be written to the appropriate | |
230 files, but all will be concatenated together and | |
231 written to file. If - is used as file, documents | |
232 will be printed to standard output, disabling link | |
233 conversion. (Use ./- to print to a file literally | |
234 named -.) | |
235 | |
236 Use of -O is not intended to mean simply "use the | |
237 name file instead of the one in the URL;" rather, it | |
238 is analogous to shell redirection: wget -O file | |
239 http://foo is intended to work like wget -O - | |
240 http://foo > file; file will be truncated immedi- | |
241 ately, and all downloaded content will be written | |
242 there. | |
243 | |
244 For this reason, -N (for timestamp-checking) is not | |
245 supported in combination with -O: since file is | |
246 always newly created, it will always have a very new | |
247 timestamp. A warning will be issued if this combina- | |
248 tion is used. | |
249 | |
250 Similarly, using -r or -p with -O may not work as | |
251 you expect: Wget won't just download the first file | |
252 to file and then download the rest to their normal | |
253 names: all downloaded content will be placed in | |
254 file. This was disabled in version 1.11, but has | |
255 been reinstated (with a warning) in 1.11.2, as there | |
256 are some cases where this behavior can actually have | |
257 some use. | |
258 | |
259 Note that a combination with -k is only permitted | |
260 when downloading a single document, as in that case | |
261 it will just convert all relative URIs to external | |
262 ones; -k makes no sense for multiple URIs when | |
263 they're all being downloaded to a single file. | |
264 | |
265 -nc | |
266 --no-clobber | |
267 If a file is downloaded more than once in the same | |
268 directory, Wget's behavior depends on a few options, | |
269 including -nc. In certain cases, the local file | |
270 will be clobbered, or overwritten, upon repeated | |
271 download. In other cases it will be preserved. | |
272 | |
273 When running Wget without -N, -nc, -r, or p, down- | |
274 loading the same file in the same directory will | |
275 result in the original copy of file being preserved | |
276 and the second copy being named file.1. If that | |
277 file is downloaded yet again, the third copy will be | |
278 named file.2, and so on. When -nc is specified, | |
279 this behavior is suppressed, and Wget will refuse to | |
280 download newer copies of file. Therefore, | |
281 ""no-clobber"" is actually a misnomer in this | |
282 mode---it's not clobbering that's prevented (as the | |
283 numeric suffixes were already preventing clobber- | |
284 ing), but rather the multiple version saving that's | |
285 prevented. | |
286 | |
287 When running Wget with -r or -p, but without -N or | |
288 -nc, re-downloading a file will result in the new | |
289 copy simply overwriting the old. Adding -nc will | |
290 prevent this behavior, instead causing the original | |
291 version to be preserved and any newer copies on the | |
292 server to be ignored. | |
293 | |
294 When running Wget with -N, with or without -r or -p, | |
295 the decision as to whether or not to download a | |
296 newer copy of a file depends on the local and remote | |
297 timestamp and size of the file. -nc may not be | |
298 specified at the same time as -N. | |
299 | |
300 Note that when -nc is specified, files with the suf- | |
301 fixes .html or .htm will be loaded from the local | |
302 disk and parsed as if they had been retrieved from | |
303 the Web. | |
304 | |
305 -c | |
306 --continue | |
307 Continue getting a partially-downloaded file. This | |
308 is useful when you want to finish up a download | |
309 started by a previous instance of Wget, or by | |
310 another program. For instance: | |
311 | |
312 wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z | |
313 | |
314 If there is a file named ls-lR.Z in the current | |
315 directory, Wget will assume that it is the first | |
316 portion of the remote file, and will ask the server | |
317 to continue the retrieval from an offset equal to | |
318 the length of the local file. | |
319 | |
320 Note that you don't need to specify this option if | |
321 you just want the current invocation of Wget to | |
322 retry downloading a file should the connection be | |
323 lost midway through. This is the default behavior. | |
324 -c only affects resumption of downloads started | |
325 prior to this invocation of Wget, and whose local | |
326 files are still sitting around. | |
327 | |
328 Without -c, the previous example would just download | |
329 the remote file to ls-lR.Z.1, leaving the truncated | |
330 ls-lR.Z file alone. | |
331 | |
332 Beginning with Wget 1.7, if you use -c on a non- | |
333 empty file, and it turns out that the server does | |
334 not support continued downloading, Wget will refuse | |
335 to start the download from scratch, which would | |
336 effectively ruin existing contents. If you really | |
337 want the download to start from scratch, remove the | |
338 file. | |
339 | |
340 Also beginning with Wget 1.7, if you use -c on a | |
341 file which is of equal size as the one on the | |
342 server, Wget will refuse to download the file and | |
343 print an explanatory message. The same happens when | |
344 the file is smaller on the server than locally (pre- | |
345 sumably because it was changed on the server since | |
346 your last download attempt)---because "continuing" | |
347 is not meaningful, no download occurs. | |
348 | |
349 On the other side of the coin, while using -c, any | |
350 file that's bigger on the server than locally will | |
351 be considered an incomplete download and only | |
352 "(length(remote) - length(local))" bytes will be | |
353 downloaded and tacked onto the end of the local | |
354 file. This behavior can be desirable in certain | |
355 cases---for instance, you can use wget -c to down- | |
356 load just the new portion that's been appended to a | |
357 data collection or log file. | |
358 | |
359 However, if the file is bigger on the server because | |
360 it's been changed, as opposed to just appended to, | |
361 you'll end up with a garbled file. Wget has no way | |
362 of verifying that the local file is really a valid | |
363 prefix of the remote file. You need to be espe- | |
364 cially careful of this when using -c in conjunction | |
365 with -r, since every file will be considered as an | |
366 "incomplete download" candidate. | |
367 | |
368 Another instance where you'll get a garbled file if | |
369 you try to use -c is if you have a lame HTTP proxy | |
370 that inserts a "transfer interrupted" string into | |
371 the local file. In the future a "rollback" option | |
372 may be added to deal with this case. | |
373 | |
374 Note that -c only works with FTP servers and with | |
375 HTTP servers that support the "Range" header. | |
376 | |
377 --progress=type | |
378 Select the type of the progress indicator you wish | |
379 to use. Legal indicators are "dot" and "bar". | |
380 | |
381 The "bar" indicator is used by default. It draws an | |
382 ASCII progress bar graphics (a.k.a "thermometer" | |
383 display) indicating the status of retrieval. If the | |
384 output is not a TTY, the "dot" bar will be used by | |
385 default. | |
386 | |
387 Use --progress=dot to switch to the "dot" display. | |
388 It traces the retrieval by printing dots on the | |
389 screen, each dot representing a fixed amount of | |
390 downloaded data. | |
391 | |
392 When using the dotted retrieval, you may also set | |
393 the style by specifying the type as dot:style. Dif- | |
394 ferent styles assign different meaning to one dot. | |
395 With the "default" style each dot represents 1K, | |
396 there are ten dots in a cluster and 50 dots in a | |
397 line. The "binary" style has a more "computer"-like | |
398 orientation---8K dots, 16-dots clusters and 48 dots | |
399 per line (which makes for 384K lines). The "mega" | |
400 style is suitable for downloading very large | |
401 files---each dot represents 64K retrieved, there are | |
402 eight dots in a cluster, and 48 dots on each line | |
403 (so each line contains 3M). | |
404 | |
405 Note that you can set the default style using the | |
406 "progress" command in .wgetrc. That setting may be | |
407 overridden from the command line. The exception is | |
408 that, when the output is not a TTY, the "dot" | |
409 progress will be favored over "bar". To force the | |
410 bar output, use --progress=bar:force. | |
411 | |
412 -N | |
413 --timestamping | |
414 Turn on time-stamping. | |
415 | |
416 -S | |
417 --server-response | |
418 Print the headers sent by HTTP servers and responses | |
419 sent by FTP servers. | |
420 | |
421 --spider | |
422 When invoked with this option, Wget will behave as a | |
423 Web spider, which means that it will not download | |
424 the pages, just check that they are there. For | |
425 example, you can use Wget to check your bookmarks: | |
426 | |
427 wget --spider --force-html -i bookmarks.html | |
428 | |
429 This feature needs much more work for Wget to get | |
430 close to the functionality of real web spiders. | |
431 | |
432 -T seconds | |
433 --timeout=seconds | |
434 Set the network timeout to seconds seconds. This is | |
435 equivalent to specifying --dns-timeout, --con- | |
436 nect-timeout, and --read-timeout, all at the same | |
437 time. | |
438 | |
439 When interacting with the network, Wget can check | |
440 for timeout and abort the operation if it takes too | |
441 long. This prevents anomalies like hanging reads | |
442 and infinite connects. The only timeout enabled by | |
443 default is a 900-second read timeout. Setting a | |
444 timeout to 0 disables it altogether. Unless you | |
445 know what you are doing, it is best not to change | |
446 the default timeout settings. | |
447 | |
448 All timeout-related options accept decimal values, | |
449 as well as subsecond values. For example, 0.1 sec- | |
450 onds is a legal (though unwise) choice of timeout. | |
451 Subsecond timeouts are useful for checking server | |
452 response times or for testing network latency. | |
453 | |
454 --dns-timeout=seconds | |
455 Set the DNS lookup timeout to seconds seconds. DNS | |
456 lookups that don't complete within the specified | |
457 time will fail. By default, there is no timeout on | |
458 DNS lookups, other than that implemented by system | |
459 libraries. | |
460 | |
461 --connect-timeout=seconds | |
462 Set the connect timeout to seconds seconds. TCP | |
463 connections that take longer to establish will be | |
464 aborted. By default, there is no connect timeout, | |
465 other than that implemented by system libraries. | |
466 | |
467 --read-timeout=seconds | |
468 Set the read (and write) timeout to seconds seconds. | |
469 The "time" of this timeout refers to idle time: if, | |
470 at any point in the download, no data is received | |
471 for more than the specified number of seconds, read- | |
472 ing fails and the download is restarted. This | |
473 option does not directly affect the duration of the | |
474 entire download. | |
475 | |
476 Of course, the remote server may choose to terminate | |
477 the connection sooner than this option requires. | |
478 The default read timeout is 900 seconds. | |
479 | |
480 --limit-rate=amount | |
481 Limit the download speed to amount bytes per second. | |
482 Amount may be expressed in bytes, kilobytes with the | |
483 k suffix, or megabytes with the m suffix. For exam- | |
484 ple, --limit-rate=20k will limit the retrieval rate | |
485 to 20KB/s. This is useful when, for whatever rea- | |
486 son, you don't want Wget to consume the entire | |
487 available bandwidth. | |
488 | |
489 This option allows the use of decimal numbers, usu- | |
490 ally in conjunction with power suffixes; for exam- | |
491 ple, --limit-rate=2.5k is a legal value. | |
492 | |
493 Note that Wget implements the limiting by sleeping | |
494 the appropriate amount of time after a network read | |
495 that took less time than specified by the rate. | |
496 Eventually this strategy causes the TCP transfer to | |
497 slow down to approximately the specified rate. How- | |
498 ever, it may take some time for this balance to be | |
499 achieved, so don't be surprised if limiting the rate | |
500 doesn't work well with very small files. | |
501 | |
502 -w seconds | |
503 --wait=seconds | |
504 Wait the specified number of seconds between the | |
505 retrievals. Use of this option is recommended, as | |
506 it lightens the server load by making the requests | |
507 less frequent. Instead of in seconds, the time can | |
508 be specified in minutes using the "m" suffix, in | |
509 hours using "h" suffix, or in days using "d" suffix. | |
510 | |
511 Specifying a large value for this option is useful | |
512 if the network or the destination host is down, so | |
513 that Wget can wait long enough to reasonably expect | |
514 the network error to be fixed before the retry. The | |
515 waiting interval specified by this function is | |
516 influenced by "--random-wait", which see. | |
517 | |
518 --waitretry=seconds | |
519 If you don't want Wget to wait between every | |
520 retrieval, but only between retries of failed down- | |
521 loads, you can use this option. Wget will use lin- | |
522 ear backoff, waiting 1 second after the first fail- | |
523 ure on a given file, then waiting 2 seconds after | |
524 the second failure on that file, up to the maximum | |
525 number of seconds you specify. Therefore, a value | |
526 of 10 will actually make Wget wait up to (1 + 2 + | |
527 ... + 10) = 55 seconds per file. | |
528 | |
529 Note that this option is turned on by default in the | |
530 global wgetrc file. | |
531 | |
532 --random-wait | |
533 Some web sites may perform log analysis to identify | |
534 retrieval programs such as Wget by looking for sta- | |
535 tistically significant similarities in the time | |
536 between requests. This option causes the time | |
537 between requests to vary between 0.5 and 1.5 * wait | |
538 seconds, where wait was specified using the --wait | |
539 option, in order to mask Wget's presence from such | |
540 analysis. | |
541 | |
542 A 2001 article in a publication devoted to develop- | |
543 ment on a popular consumer platform provided code to | |
544 perform this analysis on the fly. Its author sug- | |
545 gested blocking at the class C address level to | |
546 ensure automated retrieval programs were blocked | |
547 despite changing DHCP-supplied addresses. | |
548 | |
549 The --random-wait option was inspired by this ill- | |
550 advised recommendation to block many unrelated users | |
551 from a web site due to the actions of one. | |
552 | |
553 --no-proxy | |
554 Don't use proxies, even if the appropriate *_proxy | |
555 environment variable is defined. | |
556 | |
557 -Q quota | |
558 --quota=quota | |
559 Specify download quota for automatic retrievals. | |
560 The value can be specified in bytes (default), kilo- | |
561 bytes (with k suffix), or megabytes (with m suffix). | |
562 | |
563 Note that quota will never affect downloading a sin- | |
564 gle file. So if you specify wget -Q10k | |
565 ftp://wuarchive.wustl.edu/ls-lR.gz, all of the | |
566 ls-lR.gz will be downloaded. The same goes even | |
567 when several URLs are specified on the command-line. | |
568 However, quota is respected when retrieving either | |
569 recursively, or from an input file. Thus you may | |
570 safely type wget -Q2m -i sites---download will be | |
571 aborted when the quota is exceeded. | |
572 | |
573 Setting quota to 0 or to inf unlimits the download | |
574 quota. | |
575 | |
576 --no-dns-cache | |
577 Turn off caching of DNS lookups. Normally, Wget | |
578 remembers the IP addresses it looked up from DNS so | |
579 it doesn't have to repeatedly contact the DNS server | |
580 for the same (typically small) set of hosts it | |
581 retrieves from. This cache exists in memory only; a | |
582 new Wget run will contact DNS again. | |
583 | |
584 However, it has been reported that in some situa- | |
585 tions it is not desirable to cache host names, even | |
586 for the duration of a short-running application like | |
587 Wget. With this option Wget issues a new DNS lookup | |
588 (more precisely, a new call to "gethostbyname" or | |
589 "getaddrinfo") each time it makes a new connection. | |
590 Please note that this option will not affect caching | |
591 that might be performed by the resolving library or | |
592 by an external caching layer, such as NSCD. | |
593 | |
594 If you don't understand exactly what this option | |
595 does, you probably won't need it. | |
596 | |
597 --restrict-file-names=mode | |
598 Change which characters found in remote URLs may | |
599 show up in local file names generated from those | |
600 URLs. Characters that are restricted by this option | |
601 are escaped, i.e. replaced with %HH, where HH is the | |
602 hexadecimal number that corresponds to the | |
603 restricted character. | |
604 | |
605 By default, Wget escapes the characters that are not | |
606 valid as part of file names on your operating sys- | |
607 tem, as well as control characters that are typi- | |
608 cally unprintable. This option is useful for chang- | |
609 ing these defaults, either because you are download- | |
610 ing to a non-native partition, or because you want | |
611 to disable escaping of the control characters. | |
612 | |
613 When mode is set to "unix", Wget escapes the charac- | |
614 ter / and the control characters in the ranges 0--31 | |
615 and 128--159. This is the default on Unix-like | |
616 OS'es. | |
617 | |
618 When mode is set to "windows", Wget escapes the | |
619 characters \, |, /, :, ?, ", *, <, >, and the con- | |
620 trol characters in the ranges 0--31 and 128--159. | |
621 In addition to this, Wget in Windows mode uses + | |
622 instead of : to separate host and port in local file | |
623 names, and uses @ instead of ? to separate the query | |
624 portion of the file name from the rest. Therefore, | |
625 a URL that would be saved as | |
626 www.xemacs.org:4300/search.pl?input=blah in Unix | |
627 mode would be saved as | |
628 www.xemacs.org+4300/search.pl@input=blah in Windows | |
629 mode. This mode is the default on Windows. | |
630 | |
631 If you append ,nocontrol to the mode, as in | |
632 unix,nocontrol, escaping of the control characters | |
633 is also switched off. You can use | |
634 --restrict-file-names=nocontrol to turn off escaping | |
635 of control characters without affecting the choice | |
636 of the OS to use as file name restriction mode. | |
637 | |
638 -4 | |
639 --inet4-only | |
640 -6 | |
641 --inet6-only | |
642 Force connecting to IPv4 or IPv6 addresses. With | |
643 --inet4-only or -4, Wget will only connect to IPv4 | |
644 hosts, ignoring AAAA records in DNS, and refusing to | |
645 connect to IPv6 addresses specified in URLs. Con- | |
646 versely, with --inet6-only or -6, Wget will only | |
647 connect to IPv6 hosts and ignore A records and IPv4 | |
648 addresses. | |
649 | |
650 Neither options should be needed normally. By | |
651 default, an IPv6-aware Wget will use the address | |
652 family specified by the host's DNS record. If the | |
653 DNS responds with both IPv4 and IPv6 addresses, Wget | |
654 will try them in sequence until it finds one it can | |
655 connect to. (Also see "--prefer-family" option | |
656 described below.) | |
657 | |
658 These options can be used to deliberately force the | |
659 use of IPv4 or IPv6 address families on dual family | |
660 systems, usually to aid debugging or to deal with | |
661 broken network configuration. Only one of | |
662 --inet6-only and --inet4-only may be specified at | |
663 the same time. Neither option is available in Wget | |
664 compiled without IPv6 support. | |
665 | |
666 --prefer-family=IPv4/IPv6/none | |
667 When given a choice of several addresses, connect to | |
668 the addresses with specified address family first. | |
669 IPv4 addresses are preferred by default. | |
670 | |
671 This avoids spurious errors and connect attempts | |
672 when accessing hosts that resolve to both IPv6 and | |
673 IPv4 addresses from IPv4 networks. For example, | |
674 www.kame.net resolves to | |
675 2001:200:0:8002:203:47ff:fea5:3085 and to | |
676 203.178.141.194. When the preferred family is | |
677 "IPv4", the IPv4 address is used first; when the | |
678 preferred family is "IPv6", the IPv6 address is used | |
679 first; if the specified value is "none", the address | |
680 order returned by DNS is used without change. | |
681 | |
682 Unlike -4 and -6, this option doesn't inhibit access | |
683 to any address family, it only changes the order in | |
684 which the addresses are accessed. Also note that | |
685 the reordering performed by this option is sta- | |
686 ble---it doesn't affect order of addresses of the | |
687 same family. That is, the relative order of all | |
688 IPv4 addresses and of all IPv6 addresses remains | |
689 intact in all cases. | |
690 | |
691 --retry-connrefused | |
692 Consider "connection refused" a transient error and | |
693 try again. Normally Wget gives up on a URL when it | |
694 is unable to connect to the site because failure to | |
695 connect is taken as a sign that the server is not | |
696 running at all and that retries would not help. | |
697 This option is for mirroring unreliable sites whose | |
698 servers tend to disappear for short periods of time. | |
699 | |
700 --user=user | |
701 --password=password | |
702 Specify the username user and password password for | |
703 both FTP and HTTP file retrieval. These parameters | |
704 can be overridden using the --ftp-user and | |
705 --ftp-password options for FTP connections and the | |
706 --http-user and --http-password options for HTTP | |
707 connections. | |
708 | |
709 Directory Options | |
710 | |
711 | |
712 -nd | |
713 --no-directories | |
714 Do not create a hierarchy of directories when | |
715 retrieving recursively. With this option turned on, | |
716 all files will get saved to the current directory, | |
717 without clobbering (if a name shows up more than | |
718 once, the filenames will get extensions .n). | |
719 | |
720 -x | |
721 --force-directories | |
722 The opposite of -nd---create a hierarchy of directo- | |
723 ries, even if one would not have been created other- | |
724 wise. E.g. wget -x http://fly.srk.fer.hr/robots.txt | |
725 will save the downloaded file to fly.srk.fer.hr/ro- | |
726 bots.txt. | |
727 | |
728 -nH | |
729 --no-host-directories | |
730 Disable generation of host-prefixed directories. By | |
731 default, invoking Wget with -r | |
732 http://fly.srk.fer.hr/ will create a structure of | |
733 directories beginning with fly.srk.fer.hr/. This | |
734 option disables such behavior. | |
735 | |
736 --protocol-directories | |
737 Use the protocol name as a directory component of | |
738 local file names. For example, with this option, | |
739 wget -r http://host will save to http/host/... | |
740 rather than just to host/.... | |
741 | |
742 --cut-dirs=number | |
743 Ignore number directory components. This is useful | |
744 for getting a fine-grained control over the direc- | |
745 tory where recursive retrieval will be saved. | |
746 | |
747 Take, for example, the directory at | |
748 ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve | |
749 it with -r, it will be saved locally under | |
750 ftp.xemacs.org/pub/xemacs/. While the -nH option | |
751 can remove the ftp.xemacs.org/ part, you are still | |
752 stuck with pub/xemacs. This is where --cut-dirs | |
753 comes in handy; it makes Wget not "see" number | |
754 remote directory components. Here are several exam- | |
755 ples of how --cut-dirs option works. | |
756 | |
757 No options -> ftp.xemacs.org/pub/xemacs/ | |
758 -nH -> pub/xemacs/ | |
759 -nH --cut-dirs=1 -> xemacs/ | |
760 -nH --cut-dirs=2 -> . | |
761 | |
762 --cut-dirs=1 -> ftp.xemacs.org/xemacs/ | |
763 ... | |
764 | |
765 If you just want to get rid of the directory struc- | |
766 ture, this option is similar to a combination of -nd | |
767 and -P. However, unlike -nd, --cut-dirs does not | |
768 lose with subdirectories---for instance, with -nH | |
769 --cut-dirs=1, a beta/ subdirectory will be placed to | |
770 xemacs/beta, as one would expect. | |
771 | |
772 -P prefix | |
773 --directory-prefix=prefix | |
774 Set directory prefix to prefix. The directory pre- | |
775 fix is the directory where all other files and sub- | |
776 directories will be saved to, i.e. the top of the | |
777 retrieval tree. The default is . (the current | |
778 directory). | |
779 | |
780 HTTP Options | |
781 | |
782 | |
783 -E | |
784 --html-extension | |
785 If a file of type application/xhtml+xml or text/html | |
786 is downloaded and the URL does not end with the reg- | |
787 exp \.[Hh][Tt][Mm][Ll]?, this option will cause the | |
788 suffix .html to be appended to the local filename. | |
789 This is useful, for instance, when you're mirroring | |
790 a remote site that uses .asp pages, but you want the | |
791 mirrored pages to be viewable on your stock Apache | |
792 server. Another good use for this is when you're | |
793 downloading CGI-generated materials. A URL like | |
794 http://site.com/article.cgi?25 will be saved as | |
795 article.cgi?25.html. | |
796 | |
797 Note that filenames changed in this way will be re- | |
798 downloaded every time you re-mirror a site, because | |
799 Wget can't tell that the local X.html file corre- | |
800 sponds to remote URL X (since it doesn't yet know | |
801 that the URL produces output of type text/html or | |
802 application/xhtml+xml. To prevent this re-download- | |
803 ing, you must use -k and -K so that the original | |
804 version of the file will be saved as X.orig. | |
805 | |
806 --http-user=user | |
807 --http-password=password | |
808 Specify the username user and password password on | |
809 an HTTP server. According to the type of the chal- | |
810 lenge, Wget will encode them using either the | |
811 "basic" (insecure), the "digest", or the Windows | |
812 "NTLM" authentication scheme. | |
813 | |
814 Another way to specify username and password is in | |
815 the URL itself. Either method reveals your password | |
816 to anyone who bothers to run "ps". To prevent the | |
817 passwords from being seen, store them in .wgetrc or | |
818 .netrc, and make sure to protect those files from | |
819 other users with "chmod". If the passwords are | |
820 really important, do not leave them lying in those | |
821 files either---edit the files and delete them after | |
822 Wget has started the download. | |
823 | |
824 --no-cache | |
825 Disable server-side cache. In this case, Wget will | |
826 send the remote server an appropriate directive | |
827 (Pragma: no-cache) to get the file from the remote | |
828 service, rather than returning the cached version. | |
829 This is especially useful for retrieving and flush- | |
830 ing out-of-date documents on proxy servers. | |
831 | |
832 Caching is allowed by default. | |
833 | |
834 --no-cookies | |
835 Disable the use of cookies. Cookies are a mechanism | |
836 for maintaining server-side state. The server sends | |
837 the client a cookie using the "Set-Cookie" header, | |
838 and the client responds with the same cookie upon | |
839 further requests. Since cookies allow the server | |
840 owners to keep track of visitors and for sites to | |
841 exchange this information, some consider them a | |
842 breach of privacy. The default is to use cookies; | |
843 however, storing cookies is not on by default. | |
844 | |
845 --load-cookies file | |
846 Load cookies from file before the first HTTP | |
847 retrieval. file is a textual file in the format | |
848 originally used by Netscape's cookies.txt file. | |
849 | |
850 You will typically use this option when mirroring | |
851 sites that require that you be logged in to access | |
852 some or all of their content. The login process | |
853 typically works by the web server issuing an HTTP | |
854 cookie upon receiving and verifying your creden- | |
855 tials. The cookie is then resent by the browser | |
856 when accessing that part of the site, and so proves | |
857 your identity. | |
858 | |
859 Mirroring such a site requires Wget to send the same | |
860 cookies your browser sends when communicating with | |
861 the site. This is achieved by --load-cookies---sim- | |
862 ply point Wget to the location of the cookies.txt | |
863 file, and it will send the same cookies your browser | |
864 would send in the same situation. Different | |
865 browsers keep textual cookie files in different | |
866 locations: | |
867 | |
868 @asis<Netscape 4.x.> | |
869 The cookies are in ~/.netscape/cookies.txt. | |
870 | |
871 @asis<Mozilla and Netscape 6.x.> | |
872 Mozilla's cookie file is also named cookies.txt, | |
873 located somewhere under ~/.mozilla, in the | |
874 directory of your profile. The full path usu- | |
875 ally ends up looking somewhat like | |
876 ~/.mozilla/default/some-weird-string/cook- | |
877 ies.txt. | |
878 | |
879 @asis<Internet Explorer.> | |
880 You can produce a cookie file Wget can use by | |
881 using the File menu, Import and Export, Export | |
882 Cookies. This has been tested with Internet | |
883 Explorer 5; it is not guaranteed to work with | |
884 earlier versions. | |
885 | |
886 @asis<Other browsers.> | |
887 If you are using a different browser to create | |
888 your cookies, --load-cookies will only work if | |
889 you can locate or produce a cookie file in the | |
890 Netscape format that Wget expects. | |
891 | |
892 If you cannot use --load-cookies, there might still | |
893 be an alternative. If your browser supports a | |
894 "cookie manager", you can use it to view the cookies | |
895 used when accessing the site you're mirroring. | |
896 Write down the name and value of the cookie, and | |
897 manually instruct Wget to send those cookies, | |
898 bypassing the "official" cookie support: | |
899 | |
900 wget --no-cookies --header "Cookie: <name>=<value>" | |
901 | |
902 --save-cookies file | |
903 Save cookies to file before exiting. This will not | |
904 save cookies that have expired or that have no | |
905 expiry time (so-called "session cookies"), but also | |
906 see --keep-session-cookies. | |
907 | |
908 --keep-session-cookies | |
909 When specified, causes --save-cookies to also save | |
910 session cookies. Session cookies are normally not | |
911 saved because they are meant to be kept in memory | |
912 and forgotten when you exit the browser. Saving | |
913 them is useful on sites that require you to log in | |
914 or to visit the home page before you can access some | |
915 pages. With this option, multiple Wget runs are | |
916 considered a single browser session as far as the | |
917 site is concerned. | |
918 | |
919 Since the cookie file format does not normally carry | |
920 session cookies, Wget marks them with an expiry | |
921 timestamp of 0. Wget's --load-cookies recognizes | |
922 those as session cookies, but it might confuse other | |
923 browsers. Also note that cookies so loaded will be | |
924 treated as other session cookies, which means that | |
925 if you want --save-cookies to preserve them again, | |
926 you must use --keep-session-cookies again. | |
927 | |
928 --ignore-length | |
929 Unfortunately, some HTTP servers (CGI programs, to | |
930 be more precise) send out bogus "Content-Length" | |
931 headers, which makes Wget go wild, as it thinks not | |
932 all the document was retrieved. You can spot this | |
933 syndrome if Wget retries getting the same document | |
934 again and again, each time claiming that the (other- | |
935 wise normal) connection has closed on the very same | |
936 byte. | |
937 | |
938 With this option, Wget will ignore the "Con- | |
939 tent-Length" header---as if it never existed. | |
940 | |
941 --header=header-line | |
942 Send header-line along with the rest of the headers | |
943 in each HTTP request. The supplied header is sent | |
944 as-is, which means it must contain name and value | |
945 separated by colon, and must not contain newlines. | |
946 | |
947 You may define more than one additional header by | |
948 specifying --header more than once. | |
949 | |
950 wget --header='Accept-Charset: iso-8859-2' \ | |
951 --header='Accept-Language: hr' \ | |
952 http://fly.srk.fer.hr/ | |
953 | |
954 Specification of an empty string as the header value | |
955 will clear all previous user-defined headers. | |
956 | |
957 As of Wget 1.10, this option can be used to override | |
958 headers otherwise generated automatically. This | |
959 example instructs Wget to connect to localhost, but | |
960 to specify foo.bar in the "Host" header: | |
961 | |
962 wget --header="Host: foo.bar" http://localhost/ | |
963 | |
964 In versions of Wget prior to 1.10 such use of | |
965 --header caused sending of duplicate headers. | |
966 | |
967 --max-redirect=number | |
968 Specifies the maximum number of redirections to fol- | |
969 low for a resource. The default is 20, which is | |
970 usually far more than necessary. However, on those | |
971 occasions where you want to allow more (or fewer), | |
972 this is the option to use. | |
973 | |
974 --proxy-user=user | |
975 --proxy-password=password | |
976 Specify the username user and password password for | |
977 authentication on a proxy server. Wget will encode | |
978 them using the "basic" authentication scheme. | |
979 | |
980 Security considerations similar to those with | |
981 --http-password pertain here as well. | |
982 | |
983 --referer=url | |
984 Include `Referer: url' header in HTTP request. Use- | |
985 ful for retrieving documents with server-side pro- | |
986 cessing that assume they are always being retrieved | |
987 by interactive web browsers and only come out prop- | |
988 erly when Referer is set to one of the pages that | |
989 point to them. | |
990 | |
991 --save-headers | |
992 Save the headers sent by the HTTP server to the | |
993 file, preceding the actual contents, with an empty | |
994 line as the separator. | |
995 | |
996 -U agent-string | |
997 --user-agent=agent-string | |
998 Identify as agent-string to the HTTP server. | |
999 | |
1000 The HTTP protocol allows the clients to identify | |
1001 themselves using a "User-Agent" header field. This | |
1002 enables distinguishing the WWW software, usually for | |
1003 statistical purposes or for tracing of protocol vio- | |
1004 lations. Wget normally identifies as Wget/version, | |
1005 version being the current version number of Wget. | |
1006 | |
1007 However, some sites have been known to impose the | |
1008 policy of tailoring the output according to the | |
1009 "User-Agent"-supplied information. While this is | |
1010 not such a bad idea in theory, it has been abused by | |
1011 servers denying information to clients other than | |
1012 (historically) Netscape or, more frequently, | |
1013 Microsoft Internet Explorer. This option allows you | |
1014 to change the "User-Agent" line issued by Wget. Use | |
1015 of this option is discouraged, unless you really | |
1016 know what you are doing. | |
1017 | |
1018 Specifying empty user agent with --user-agent="" | |
1019 instructs Wget not to send the "User-Agent" header | |
1020 in HTTP requests. | |
1021 | |
1022 --post-data=string | |
1023 --post-file=file | |
1024 Use POST as the method for all HTTP requests and | |
1025 send the specified data in the request body. | |
1026 "--post-data" sends string as data, whereas | |
1027 "--post-file" sends the contents of file. Other | |
1028 than that, they work in exactly the same way. | |
1029 | |
1030 Please be aware that Wget needs to know the size of | |
1031 the POST data in advance. Therefore the argument to | |
1032 "--post-file" must be a regular file; specifying a | |
1033 FIFO or something like /dev/stdin won't work. It's | |
1034 not quite clear how to work around this limitation | |
1035 inherent in HTTP/1.0. Although HTTP/1.1 introduces | |
1036 chunked transfer that doesn't require knowing the | |
1037 request length in advance, a client can't use chun- | |
1038 ked unless it knows it's talking to an HTTP/1.1 | |
1039 server. And it can't know that until it receives a | |
1040 response, which in turn requires the request to have | |
1041 been completed -- a chicken-and-egg problem. | |
1042 | |
1043 Note: if Wget is redirected after the POST request | |
1044 is completed, it will not send the POST data to the | |
1045 redirected URL. This is because URLs that process | |
1046 POST often respond with a redirection to a regular | |
1047 page, which does not desire or accept POST. It is | |
1048 not completely clear that this behavior is optimal; | |
1049 if it doesn't work out, it might be changed in the | |
1050 future. | |
1051 | |
1052 This example shows how to log to a server using POST | |
1053 and then proceed to download the desired pages, pre- | |
1054 sumably only accessible to authorized users: | |
1055 | |
1056 # Log in to the server. This can be done only once. | |
1057 wget --save-cookies cookies.txt \ | |
1058 --post-data 'user=foo&password=bar' \ | |
1059 http://server.com/auth.php | |
1060 | |
1061 # Now grab the page or pages we care about. | |
1062 wget --load-cookies cookies.txt \ | |
1063 -p http://server.com/interesting/article.php | |
1064 | |
1065 If the server is using session cookies to track user | |
1066 authentication, the above will not work because | |
1067 --save-cookies will not save them (and neither will | |
1068 browsers) and the cookies.txt file will be empty. | |
1069 In that case use --keep-session-cookies along with | |
1070 --save-cookies to force saving of session cookies. | |
1071 | |
1072 --content-disposition | |
1073 If this is set to on, experimental (not fully-func- | |
1074 tional) support for "Content-Disposition" headers is | |
1075 enabled. This can currently result in extra round- | |
1076 trips to the server for a "HEAD" request, and is | |
1077 known to suffer from a few bugs, which is why it is | |
1078 not currently enabled by default. | |
1079 | |
1080 This option is useful for some file-downloading CGI | |
1081 programs that use "Content-Disposition" headers to | |
1082 describe what the name of a downloaded file should | |
1083 be. | |
1084 | |
1085 --auth-no-challenge | |
1086 If this option is given, Wget will send Basic HTTP | |
1087 authentication information (plaintext username and | |
1088 password) for all requests, just like Wget 1.10.2 | |
1089 and prior did by default. | |
1090 | |
1091 Use of this option is not recommended, and is | |
1092 intended only to support some few obscure servers, | |
1093 which never send HTTP authentication challenges, but | |
1094 accept unsolicited auth info, say, in addition to | |
1095 form-based authentication. | |
1096 | |
1097 HTTPS (SSL/TLS) Options | |
1098 | |
1099 To support encrypted HTTP (HTTPS) downloads, Wget must | |
1100 be compiled with an external SSL library, currently | |
1101 OpenSSL. If Wget is compiled without SSL support, none | |
1102 of these options are available. | |
1103 | |
1104 --secure-protocol=protocol | |
1105 Choose the secure protocol to be used. Legal values | |
1106 are auto, SSLv2, SSLv3, and TLSv1. If auto is used, | |
1107 the SSL library is given the liberty of choosing the | |
1108 appropriate protocol automatically, which is | |
1109 achieved by sending an SSLv2 greeting and announcing | |
1110 support for SSLv3 and TLSv1. This is the default. | |
1111 | |
1112 Specifying SSLv2, SSLv3, or TLSv1 forces the use of | |
1113 the corresponding protocol. This is useful when | |
1114 talking to old and buggy SSL server implementations | |
1115 that make it hard for OpenSSL to choose the correct | |
1116 protocol version. Fortunately, such servers are | |
1117 quite rare. | |
1118 | |
1119 --no-check-certificate | |
1120 Don't check the server certificate against the | |
1121 available certificate authorities. Also don't | |
1122 require the URL host name to match the common name | |
1123 presented by the certificate. | |
1124 | |
1125 As of Wget 1.10, the default is to verify the | |
1126 server's certificate against the recognized certifi- | |
1127 cate authorities, breaking the SSL handshake and | |
1128 aborting the download if the verification fails. | |
1129 Although this provides more secure downloads, it | |
1130 does break interoperability with some sites that | |
1131 worked with previous Wget versions, particularly | |
1132 those using self-signed, expired, or otherwise | |
1133 invalid certificates. This option forces an "inse- | |
1134 cure" mode of operation that turns the certificate | |
1135 verification errors into warnings and allows you to | |
1136 proceed. | |
1137 | |
1138 If you encounter "certificate verification" errors | |
1139 or ones saying that "common name doesn't match | |
1140 requested host name", you can use this option to | |
1141 bypass the verification and proceed with the down- | |
1142 load. Only use this option if you are otherwise | |
1143 convinced of the site's authenticity, or if you | |
1144 really don't care about the validity of its certifi- | |
1145 cate. It is almost always a bad idea not to check | |
1146 the certificates when transmitting confidential or | |
1147 important data. | |
1148 | |
1149 --certificate=file | |
1150 Use the client certificate stored in file. This is | |
1151 needed for servers that are configured to require | |
1152 certificates from the clients that connect to them. | |
1153 Normally a certificate is not required and this | |
1154 switch is optional. | |
1155 | |
1156 --certificate-type=type | |
1157 Specify the type of the client certificate. Legal | |
1158 values are PEM (assumed by default) and DER, also | |
1159 known as ASN1. | |
1160 | |
1161 --private-key=file | |
1162 Read the private key from file. This allows you to | |
1163 provide the private key in a file separate from the | |
1164 certificate. | |
1165 | |
1166 --private-key-type=type | |
1167 Specify the type of the private key. Accepted val- | |
1168 ues are PEM (the default) and DER. | |
1169 | |
1170 --ca-certificate=file | |
1171 Use file as the file with the bundle of certificate | |
1172 authorities ("CA") to verify the peers. The cer- | |
1173 tificates must be in PEM format. | |
1174 | |
1175 Without this option Wget looks for CA certificates | |
1176 at the system-specified locations, chosen at OpenSSL | |
1177 installation time. | |
1178 | |
1179 --ca-directory=directory | |
1180 Specifies directory containing CA certificates in | |
1181 PEM format. Each file contains one CA certificate, | |
1182 and the file name is based on a hash value derived | |
1183 from the certificate. This is achieved by process- | |
1184 ing a certificate directory with the "c_rehash" | |
1185 utility supplied with OpenSSL. Using --ca-directory | |
1186 is more efficient than --ca-certificate when many | |
1187 certificates are installed because it allows Wget to | |
1188 fetch certificates on demand. | |
1189 | |
1190 Without this option Wget looks for CA certificates | |
1191 at the system-specified locations, chosen at OpenSSL | |
1192 installation time. | |
1193 | |
1194 --random-file=file | |
1195 Use file as the source of random data for seeding | |
1196 the pseudo-random number generator on systems with- | |
1197 out /dev/random. | |
1198 | |
1199 On such systems the SSL library needs an external | |
1200 source of randomness to initialize. Randomness may | |
1201 be provided by EGD (see --egd-file below) or read | |
1202 from an external source specified by the user. If | |
1203 this option is not specified, Wget looks for random | |
1204 data in $RANDFILE or, if that is unset, in | |
1205 $HOME/.rnd. If none of those are available, it is | |
1206 likely that SSL encryption will not be usable. | |
1207 | |
1208 If you're getting the "Could not seed OpenSSL PRNG; | |
1209 disabling SSL." error, you should provide random | |
1210 data using some of the methods described above. | |
1211 | |
1212 --egd-file=file | |
1213 Use file as the EGD socket. EGD stands for Entropy | |
1214 Gathering Daemon, a user-space program that collects | |
1215 data from various unpredictable system sources and | |
1216 makes it available to other programs that might need | |
1217 it. Encryption software, such as the SSL library, | |
1218 needs sources of non-repeating randomness to seed | |
1219 the random number generator used to produce crypto- | |
1220 graphically strong keys. | |
1221 | |
1222 OpenSSL allows the user to specify his own source of | |
1223 entropy using the "RAND_FILE" environment variable. | |
1224 If this variable is unset, or if the specified file | |
1225 does not produce enough randomness, OpenSSL will | |
1226 read random data from EGD socket specified using | |
1227 this option. | |
1228 | |
1229 If this option is not specified (and the equivalent | |
1230 startup command is not used), EGD is never con- | |
1231 tacted. EGD is not needed on modern Unix systems | |
1232 that support /dev/random. | |
1233 | |
1234 FTP Options | |
1235 | |
1236 | |
1237 --ftp-user=user | |
1238 --ftp-password=password | |
1239 Specify the username user and password password on | |
1240 an FTP server. Without this, or the corresponding | |
1241 startup option, the password defaults to -wget@, | |
1242 normally used for anonymous FTP. | |
1243 | |
1244 Another way to specify username and password is in | |
1245 the URL itself. Either method reveals your password | |
1246 to anyone who bothers to run "ps". To prevent the | |
1247 passwords from being seen, store them in .wgetrc or | |
1248 .netrc, and make sure to protect those files from | |
1249 other users with "chmod". If the passwords are | |
1250 really important, do not leave them lying in those | |
1251 files either---edit the files and delete them after | |
1252 Wget has started the download. | |
1253 | |
1254 --no-remove-listing | |
1255 Don't remove the temporary .listing files generated | |
1256 by FTP retrievals. Normally, these files contain | |
1257 the raw directory listings received from FTP | |
1258 servers. Not removing them can be useful for debug- | |
1259 ging purposes, or when you want to be able to easily | |
1260 check on the contents of remote server directories | |
1261 (e.g. to verify that a mirror you're running is com- | |
1262 plete). | |
1263 | |
1264 Note that even though Wget writes to a known file- | |
1265 name for this file, this is not a security hole in | |
1266 the scenario of a user making .listing a symbolic | |
1267 link to /etc/passwd or something and asking "root" | |
1268 to run Wget in his or her directory. Depending on | |
1269 the options used, either Wget will refuse to write | |
1270 to .listing, making the globbing/recur- | |
1271 sion/time-stamping operation fail, or the symbolic | |
1272 link will be deleted and replaced with the actual | |
1273 .listing file, or the listing will be written to a | |
1274 .listing.number file. | |
1275 | |
1276 Even though this situation isn't a problem, though, | |
1277 "root" should never run Wget in a non-trusted user's | |
1278 directory. A user could do something as simple as | |
1279 linking index.html to /etc/passwd and asking "root" | |
1280 to run Wget with -N or -r so the file will be over- | |
1281 written. | |
1282 | |
1283 --no-glob | |
1284 Turn off FTP globbing. Globbing refers to the use | |
1285 of shell-like special characters (wildcards), like | |
1286 *, ?, [ and ] to retrieve more than one file from | |
1287 the same directory at once, like: | |
1288 | |
1289 wget ftp://gnjilux.srk.fer.hr/*.msg | |
1290 | |
1291 By default, globbing will be turned on if the URL | |
1292 contains a globbing character. This option may be | |
1293 used to turn globbing on or off permanently. | |
1294 | |
1295 You may have to quote the URL to protect it from | |
1296 being expanded by your shell. Globbing makes Wget | |
1297 look for a directory listing, which is system-spe- | |
1298 cific. This is why it currently works only with | |
1299 Unix FTP servers (and the ones emulating Unix "ls" | |
1300 output). | |
1301 | |
1302 --no-passive-ftp | |
1303 Disable the use of the passive FTP transfer mode. | |
1304 Passive FTP mandates that the client connect to the | |
1305 server to establish the data connection rather than | |
1306 the other way around. | |
1307 | |
1308 If the machine is connected to the Internet | |
1309 directly, both passive and active FTP should work | |
1310 equally well. Behind most firewall and NAT configu- | |
1311 rations passive FTP has a better chance of working. | |
1312 However, in some rare firewall configurations, | |
1313 active FTP actually works when passive FTP doesn't. | |
1314 If you suspect this to be the case, use this option, | |
1315 or set "passive_ftp=off" in your init file. | |
1316 | |
1317 --retr-symlinks | |
1318 Usually, when retrieving FTP directories recursively | |
1319 and a symbolic link is encountered, the linked-to | |
1320 file is not downloaded. Instead, a matching sym- | |
1321 bolic link is created on the local filesystem. The | |
1322 pointed-to file will not be downloaded unless this | |
1323 recursive retrieval would have encountered it sepa- | |
1324 rately and downloaded it anyway. | |
1325 | |
1326 When --retr-symlinks is specified, however, symbolic | |
1327 links are traversed and the pointed-to files are | |
1328 retrieved. At this time, this option does not cause | |
1329 Wget to traverse symlinks to directories and recurse | |
1330 through them, but in the future it should be | |
1331 enhanced to do this. | |
1332 | |
1333 Note that when retrieving a file (not a directory) | |
1334 because it was specified on the command-line, rather | |
1335 than because it was recursed to, this option has no | |
1336 effect. Symbolic links are always traversed in this | |
1337 case. | |
1338 | |
1339 --no-http-keep-alive | |
1340 Turn off the "keep-alive" feature for HTTP down- | |
1341 loads. Normally, Wget asks the server to keep the | |
1342 connection open so that, when you download more than | |
1343 one document from the same server, they get trans- | |
1344 ferred over the same TCP connection. This saves | |
1345 time and at the same time reduces the load on the | |
1346 server. | |
1347 | |
1348 This option is useful when, for some reason, persis- | |
1349 tent (keep-alive) connections don't work for you, | |
1350 for example due to a server bug or due to the | |
1351 inability of server-side scripts to cope with the | |
1352 connections. | |
1353 | |
1354 Recursive Retrieval Options | |
1355 | |
1356 | |
1357 -r | |
1358 --recursive | |
1359 Turn on recursive retrieving. | |
1360 | |
1361 -l depth | |
1362 --level=depth | |
1363 Specify recursion maximum depth level depth. The | |
1364 default maximum depth is 5. | |
1365 | |
1366 --delete-after | |
1367 This option tells Wget to delete every single file | |
1368 it downloads, after having done so. It is useful | |
1369 for pre-fetching popular pages through a proxy, | |
1370 e.g.: | |
1371 | |
1372 wget -r -nd --delete-after http://whatever.com/~popular/page/ | |
1373 | |
1374 The -r option is to retrieve recursively, and -nd to | |
1375 not create directories. | |
1376 | |
1377 Note that --delete-after deletes files on the local | |
1378 machine. It does not issue the DELE command to | |
1379 remote FTP sites, for instance. Also note that when | |
1380 --delete-after is specified, --convert-links is | |
1381 ignored, so .orig files are simply not created in | |
1382 the first place. | |
1383 | |
1384 -k | |
1385 --convert-links | |
1386 After the download is complete, convert the links in | |
1387 the document to make them suitable for local view- | |
1388 ing. This affects not only the visible hyperlinks, | |
1389 but any part of the document that links to external | |
1390 content, such as embedded images, links to style | |
1391 sheets, hyperlinks to non-HTML content, etc. | |
1392 | |
1393 Each link will be changed in one of the two ways: | |
1394 | |
1395 * The links to files that have been downloaded by | |
1396 Wget will be changed to refer to the file they | |
1397 point to as a relative link. | |
1398 | |
1399 Example: if the downloaded file /foo/doc.html | |
1400 links to /bar/img.gif, also downloaded, then the | |
1401 link in doc.html will be modified to point to | |
1402 ../bar/img.gif. This kind of transformation | |
1403 works reliably for arbitrary combinations of | |
1404 directories. | |
1405 | |
1406 * The links to files that have not been downloaded | |
1407 by Wget will be changed to include host name and | |
1408 absolute path of the location they point to. | |
1409 | |
1410 Example: if the downloaded file /foo/doc.html | |
1411 links to /bar/img.gif (or to ../bar/img.gif), | |
1412 then the link in doc.html will be modified to | |
1413 point to http://hostname/bar/img.gif. | |
1414 | |
1415 Because of this, local browsing works reliably: if a | |
1416 linked file was downloaded, the link will refer to | |
1417 its local name; if it was not downloaded, the link | |
1418 will refer to its full Internet address rather than | |
1419 presenting a broken link. The fact that the former | |
1420 links are converted to relative links ensures that | |
1421 you can move the downloaded hierarchy to another | |
1422 directory. | |
1423 | |
1424 Note that only at the end of the download can Wget | |
1425 know which links have been downloaded. Because of | |
1426 that, the work done by -k will be performed at the | |
1427 end of all the downloads. | |
1428 | |
1429 -K | |
1430 --backup-converted | |
1431 When converting a file, back up the original version | |
1432 with a .orig suffix. Affects the behavior of -N. | |
1433 | |
1434 -m | |
1435 --mirror | |
1436 Turn on options suitable for mirroring. This option | |
1437 turns on recursion and time-stamping, sets infinite | |
1438 recursion depth and keeps FTP directory listings. | |
1439 It is currently equivalent to -r -N -l inf | |
1440 --no-remove-listing. | |
1441 | |
1442 -p | |
1443 --page-requisites | |
1444 This option causes Wget to download all the files | |
1445 that are necessary to properly display a given HTML | |
1446 page. This includes such things as inlined images, | |
1447 sounds, and referenced stylesheets. | |
1448 | |
1449 Ordinarily, when downloading a single HTML page, any | |
1450 requisite documents that may be needed to display it | |
1451 properly are not downloaded. Using -r together with | |
1452 -l can help, but since Wget does not ordinarily dis- | |
1453 tinguish between external and inlined documents, one | |
1454 is generally left with "leaf documents" that are | |
1455 missing their requisites. | |
1456 | |
1457 For instance, say document 1.html contains an | |
1458 "<IMG>" tag referencing 1.gif and an "<A>" tag | |
1459 pointing to external document 2.html. Say that | |
1460 2.html is similar but that its image is 2.gif and it | |
1461 links to 3.html. Say this continues up to some | |
1462 arbitrarily high number. | |
1463 | |
1464 If one executes the command: | |
1465 | |
1466 wget -r -l 2 http://<site>/1.html | |
1467 | |
1468 then 1.html, 1.gif, 2.html, 2.gif, and 3.html will | |
1469 be downloaded. As you can see, 3.html is without | |
1470 its requisite 3.gif because Wget is simply counting | |
1471 the number of hops (up to 2) away from 1.html in | |
1472 order to determine where to stop the recursion. | |
1473 However, with this command: | |
1474 | |
1475 wget -r -l 2 -p http://<site>/1.html | |
1476 | |
1477 all the above files and 3.html's requisite 3.gif | |
1478 will be downloaded. Similarly, | |
1479 | |
1480 wget -r -l 1 -p http://<site>/1.html | |
1481 | |
1482 will cause 1.html, 1.gif, 2.html, and 2.gif to be | |
1483 downloaded. One might think that: | |
1484 | |
1485 wget -r -l 0 -p http://<site>/1.html | |
1486 | |
1487 would download just 1.html and 1.gif, but unfortu- | |
1488 nately this is not the case, because -l 0 is equiva- | |
1489 lent to -l inf---that is, infinite recursion. To | |
1490 download a single HTML page (or a handful of them, | |
1491 all specified on the command-line or in a -i URL | |
1492 input file) and its (or their) requisites, simply | |
1493 leave off -r and -l: | |
1494 | |
1495 wget -p http://<site>/1.html | |
1496 | |
1497 Note that Wget will behave as if -r had been speci- | |
1498 fied, but only that single page and its requisites | |
1499 will be downloaded. Links from that page to exter- | |
1500 nal documents will not be followed. Actually, to | |
1501 download a single page and all its requisites (even | |
1502 if they exist on separate websites), and make sure | |
1503 the lot displays properly locally, this author likes | |
1504 to use a few options in addition to -p: | |
1505 | |
1506 wget -E -H -k -K -p http://<site>/<document> | |
1507 | |
1508 To finish off this topic, it's worth knowing that | |
1509 Wget's idea of an external document link is any URL | |
1510 specified in an "<A>" tag, an "<AREA>" tag, or a | |
1511 "<LINK>" tag other than "<LINK REL="stylesheet">". | |
1512 | |
1513 --strict-comments | |
1514 Turn on strict parsing of HTML comments. The | |
1515 default is to terminate comments at the first occur- | |
1516 rence of -->. | |
1517 | |
1518 According to specifications, HTML comments are | |
1519 expressed as SGML declarations. Declaration is spe- | |
1520 cial markup that begins with <! and ends with >, | |
1521 such as <!DOCTYPE ...>, that may contain comments | |
1522 between a pair of -- delimiters. HTML comments are | |
1523 "empty declarations", SGML declarations without any | |
1524 non-comment text. Therefore, <!--foo--> is a valid | |
1525 comment, and so is <!--one-- --two-->, but | |
1526 <!--1--2--> is not. | |
1527 | |
1528 On the other hand, most HTML writers don't perceive | |
1529 comments as anything other than text delimited with | |
1530 <!-- and -->, which is not quite the same. For | |
1531 example, something like <!------------> works as a | |
1532 valid comment as long as the number of dashes is a | |
1533 multiple of four (!). If not, the comment techni- | |
1534 cally lasts until the next --, which may be at the | |
1535 other end of the document. Because of this, many | |
1536 popular browsers completely ignore the specification | |
1537 and implement what users have come to expect: com- | |
1538 ments delimited with <!-- and -->. | |
1539 | |
1540 Until version 1.9, Wget interpreted comments | |
1541 strictly, which resulted in missing links in many | |
1542 web pages that displayed fine in browsers, but had | |
1543 the misfortune of containing non-compliant comments. | |
1544 Beginning with version 1.9, Wget has joined the | |
1545 ranks of clients that implements "naive" comments, | |
1546 terminating each comment at the first occurrence of | |
1547 -->. | |
1548 | |
1549 If, for whatever reason, you want strict comment | |
1550 parsing, use this option to turn it on. | |
1551 | |
1552 Recursive Accept/Reject Options | |
1553 | |
1554 | |
1555 -A acclist --accept acclist | |
1556 -R rejlist --reject rejlist | |
1557 Specify comma-separated lists of file name suffixes | |
1558 or patterns to accept or reject. Note that if any of | |
1559 the wildcard characters, *, ?, [ or ], appear in an | |
1560 element of acclist or rejlist, it will be treated as | |
1561 a pattern, rather than a suffix. | |
1562 | |
1563 -D domain-list | |
1564 --domains=domain-list | |
1565 Set domains to be followed. domain-list is a comma- | |
1566 separated list of domains. Note that it does not | |
1567 turn on -H. | |
1568 | |
1569 --exclude-domains domain-list | |
1570 Specify the domains that are not to be followed.. | |
1571 | |
1572 --follow-ftp | |
1573 Follow FTP links from HTML documents. Without this | |
1574 option, Wget will ignore all the FTP links. | |
1575 | |
1576 --follow-tags=list | |
1577 Wget has an internal table of HTML tag / attribute | |
1578 pairs that it considers when looking for linked doc- | |
1579 uments during a recursive retrieval. If a user | |
1580 wants only a subset of those tags to be considered, | |
1581 however, he or she should be specify such tags in a | |
1582 comma-separated list with this option. | |
1583 | |
1584 --ignore-tags=list | |
1585 This is the opposite of the --follow-tags option. | |
1586 To skip certain HTML tags when recursively looking | |
1587 for documents to download, specify them in a comma- | |
1588 separated list. | |
1589 | |
1590 In the past, this option was the best bet for down- | |
1591 loading a single page and its requisites, using a | |
1592 command-line like: | |
1593 | |
1594 wget --ignore-tags=a,area -H -k -K -r http://<site>/<document> | |
1595 | |
1596 However, the author of this option came across a | |
1597 page with tags like "<LINK REL="home" HREF="/">" and | |
1598 came to the realization that specifying tags to | |
1599 ignore was not enough. One can't just tell Wget to | |
1600 ignore "<LINK>", because then stylesheets will not | |
1601 be downloaded. Now the best bet for downloading a | |
1602 single page and its requisites is the dedicated | |
1603 --page-requisites option. | |
1604 | |
1605 --ignore-case | |
1606 Ignore case when matching files and directories. | |
1607 This influences the behavior of -R, -A, -I, and -X | |
1608 options, as well as globbing implemented when down- | |
1609 loading from FTP sites. For example, with this | |
1610 option, -A *.txt will match file1.txt, but also | |
1611 file2.TXT, file3.TxT, and so on. | |
1612 | |
1613 -H | |
1614 --span-hosts | |
1615 Enable spanning across hosts when doing recursive | |
1616 retrieving. | |
1617 | |
1618 -L | |
1619 --relative | |
1620 Follow relative links only. Useful for retrieving a | |
1621 specific home page without any distractions, not | |
1622 even those from the same hosts. | |
1623 | |
1624 -I list | |
1625 --include-directories=list | |
1626 Specify a comma-separated list of directories you | |
1627 wish to follow when downloading. Elements of list | |
1628 may contain wildcards. | |
1629 | |
1630 -X list | |
1631 --exclude-directories=list | |
1632 Specify a comma-separated list of directories you | |
1633 wish to exclude from download. Elements of list may | |
1634 contain wildcards. | |
1635 | |
1636 -np | |
1637 --no-parent | |
1638 Do not ever ascend to the parent directory when | |
1639 retrieving recursively. This is a useful option, | |
1640 since it guarantees that only the files below a cer- | |
1641 tain hierarchy will be downloaded. | |
1642 | |
1643 FILES | |
1644 /usr/local/etc/wgetrc | |
1645 Default location of the global startup file. | |
1646 | |
1647 .wgetrc | |
1648 User startup file. | |
1649 | |
1650 BUGS | |
1651 You are welcome to submit bug reports via the GNU Wget | |
1652 bug tracker (see <http://wget.addictivecode.org/Bug- | |
1653 Tracker>). | |
1654 | |
1655 Before actually submitting a bug report, please try to | |
1656 follow a few simple guidelines. | |
1657 | |
1658 1. Please try to ascertain that the behavior you see | |
1659 really is a bug. If Wget crashes, it's a bug. If | |
1660 Wget does not behave as documented, it's a bug. If | |
1661 things work strange, but you are not sure about the | |
1662 way they are supposed to work, it might well be a | |
1663 bug, but you might want to double-check the documen- | |
1664 tation and the mailing lists. | |
1665 | |
1666 2. Try to repeat the bug in as simple circumstances as | |
1667 possible. E.g. if Wget crashes while downloading | |
1668 wget -rl0 -kKE -t5 --no-proxy http://yoyodyne.com -o | |
1669 /tmp/log, you should try to see if the crash is | |
1670 repeatable, and if will occur with a simpler set of | |
1671 options. You might even try to start the download | |
1672 at the page where the crash occurred to see if that | |
1673 page somehow triggered the crash. | |
1674 | |
1675 Also, while I will probably be interested to know | |
1676 the contents of your .wgetrc file, just dumping it | |
1677 into the debug message is probably a bad idea. | |
1678 Instead, you should first try to see if the bug | |
1679 repeats with .wgetrc moved out of the way. Only if | |
1680 it turns out that .wgetrc settings affect the bug, | |
1681 mail me the relevant parts of the file. | |
1682 | |
1683 3. Please start Wget with -d option and send us the | |
1684 resulting output (or relevant parts thereof). If | |
1685 Wget was compiled without debug support, recompile | |
1686 it---it is much easier to trace bugs with debug sup- | |
1687 port on. | |
1688 | |
1689 Note: please make sure to remove any potentially | |
1690 sensitive information from the debug log before | |
1691 sending it to the bug address. The "-d" won't go | |
1692 out of its way to collect sensitive information, but | |
1693 the log will contain a fairly complete transcript of | |
1694 Wget's communication with the server, which may | |
1695 include passwords and pieces of downloaded data. | |
1696 Since the bug address is publically archived, you | |
1697 may assume that all bug reports are visible to the | |
1698 public. | |
1699 | |
1700 4. If Wget has crashed, try to run it in a debugger, | |
1701 e.g. "gdb `which wget` core" and type "where" to get | |
1702 the backtrace. This may not work if the system | |
1703 administrator has disabled core files, but it is | |
1704 safe to try. | |
1705 | |
1706 SEE ALSO | |
1707 This is not the complete manual for GNU Wget. For more | |
1708 complete information, including more detailed explana- | |
1709 tions of some of the options, and a number of commands | |
1710 available for use with .wgetrc files and the -e option, | |
1711 see the GNU Info entry for wget. | |
1712 | |
1713 AUTHOR | |
1714 Originally written by Hrvoje Niksic | |
1715 <hniksic@xemacs.org>. Currently maintained by Micah | |
1716 Cowan <micah@cowan.name>. | |
1717 | |
1718 COPYRIGHT | |
1719 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, | |
1720 2003, 2004, 2005, 2006, 2007, 2008 Free Software Founda- | |
1721 tion, Inc. | |
1722 | |
1723 Permission is granted to copy, distribute and/or modify | |
1724 this document under the terms of the GNU Free Documenta- | |
1725 tion License, Version 1.2 or any later version published | |
1726 by the Free Software Foundation; with no Invariant Sec- | |
1727 tions, no Front-Cover Texts, and no Back-Cover Texts. A | |
1728 copy of the license is included in the section entitled | |
1729 "GNU Free Documentation License". | |
1730 | |
1731 | |
1732 | |
1733 GNU Wget 1.11.4 2008-06-29 WGET(1) |