Wget is a free software package that can be used for retrieving files using HTTP, HTTPS and FTP which are considered as the most widely-used Internet protocols. Its name comes from World Wide Web + get. wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including:
- Can resume aborted downloads, using REST and RANGE;
- Can use filename wild cards and recursively mirror directories;
- NLS-based message files for many different languages;
- Optionally converts absolute links in downloaded documents to relative, so that downloaded documents may link to each other locally;
- Runs on most UNIX-like operating systems as well as Microsoft Windows;
- Supports HTTP proxies;
- Supports HTTP cookies;
- Supports persistent HTTP connections;
- Unattended / background operation;
- Uses local file timestamps to determine whether documents need to be re-downloaded when mirroring;
- GNU Wget is distributed under the GNU General Public License.
wget is non-interactive which gives great flexibility in using it. It can be easily called from scripts, cron jobs, terminals etc. It can work in the background even if a user is not logged in.
In this article, we will install wget on an Ubuntu 16.04 VPS and provide some useful wget example commands. Please note that even though tested on Ubuntu 16.04, the instructions can be used on any other Ubuntu version.
We will be using our SSD 1 Linux VPS hosting plan running Ubuntu 16.04
LOG IN TO YOUR SERVER VIA SSH
# ssh root@server_ip
You can check whether you have the proper Ubuntu version installed on your server with the following command:
# lsb_release -a
You should get this output:
Distributor ID: Ubuntu Description: Ubuntu 16.04.1 LTS Release: 16.04 Codename: xenial
UPDATE THE SYSTEM
Make sure your server is fully up to date using:
# apt update && apt upgrade
INSTALL AND USE WGET
Once the upgrades are done, install wget using:
# apt install wget
We can now start using wget.
I will now download the latest WordPress version using wget:
# wget https://wordpress.org/latest.zip
The output from this command will include a download status bar which will tell you how far the download has come and what is the download speed.
wget by default picks the filename according to the last word after the forward slash. Sometimes this can be an issue as some downloads will have a clumsy name. To avoid this, you can save the file into a value of your choice. Let’s modify the WordPress download command a little bit:
# wget -o wordpress.zip https://wordpress.org/latest.zip
Now the downloaded file will be named wordpress.zip and not latest.zip as the default use of wget would have named it.
You can even specify the download speed. For example:
# wget --limit-rate=400k https://wordpress.org/latest.zip
I had cases when I downloaded big files and due to a temporarily lost connection, the download was interrupted. But have no fear, because the -c flag is here. Using -c in the command will continue with the download from where it stopped. Example:
# wget -c http://sampledomain.com/file.zip
This is why it is recommended to put the download in the background when the file is big. This can be done using -b:
# wget -b http://sampledomain.com/file.zip
Sometimes the servers that the files are being downloaded from can be busy and slow. So using wget in it’s most natural form is not recommended. You can set up a number of retries for wget to download the file.
# wget --tries=15 https://wordpress.org/latest.zip
You can also download multiple files using one command. First, let’s open a file. Call it download.txt:
# touch download.txt
Now using a text editor of your choice enter the download URL’s in the file. We are using nano:
# nano download.txt
We are closing and saving the file. Let’s see what we entered:
# cat download.txt https://wordpress.org/latest.zip https://downloads.joomla.org/us/cms/joomla3/3-6-5/joomla_3-6-5-stable-full_package-zip https://ftp.drupal.org/files/projects/drupal-8.2.4.tar.gz
Now use the below command to download all the files from the download.txt file:
# wget -i download.txt
Very useful, right?
You can also find out the date when a web page have been modified last:
# wget ‐‐server-response ‐‐spider http://google.com
We mentioned in the introduction of this article that wget can download recursively. This way you can download a whole directory. Example:
# wget -r sampledomain.com/directory
Once, I had to migrate a Magento website, but only had FTP access to the account and believe me, migrating over FTP can be slow. So I had to use wget to download the data. You are probably wondering how? Well, this is what I did:
- Created an archive file that contains the Magento files/directories;
- Moved that file into the website document root;
- Used wget to download the file.
I reckon that you already know how I downloaded the file, but here goes that magic command that saved me from a slow migration:
# wget http://magento_domain.com/archivedmagento.zip
Since Magento data can be big, you can use some of the above options (flags) to put wget in the background or continue from where the download was interrupted.
While we are talking about FTP, you can also use wget to perform an FTP download:
# wget ftp-URL
Or download using the FTP username and password:
# wget --ftp-user=USERNAME --ftp-password=FTP_PASSWORD URL
As you can see, wget is a very useful tool for everyday Linux administration. You can find more info about wget and it’s options from the wget man page.
# man wget WGET(1) GNU Wget WGET(1) NAME Wget - The non-interactive network downloader. SYNOPSIS wget [option]... [URL]... DESCRIPTION GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work. By contrast, most of the Web browsers require constant user's presence, which can be a great hindrance when transferring a lot of data. Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard (https://websetnet.b-cdn.net/robots.txt). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing. Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off. OPTIONS Option Syntax Since Wget uses GNU getopt to process command-line arguments, every option has a long form along with the short one. Long options are more convenient to remember, but take time to type. You may freely mix different option styles, or specify options after the command-line arguments. Thus you may write: wget -r --tries=10 http://fly.srk.fer.hr/ -o log The space between the option accepting an argument and the argument may be omitted. Instead of -o log you can write -olog. You may put several options that do not require arguments together, like: wget -drc This is completely equivalent to: wget -d -r -c
Hopefully, you now have a clearer view on what wget can do for you.