So you want to run ProFTPD on an AWS EC2 instance? Due to FTP's nature as a multi-connection protocol, it is not as straightforward to use FTP within AWS EC2, but it can be done. Read on to find out how. Note that the following documentation assumes that you know how to install and configure ProFTPD already. If you are only running individual FTP servers, then the sections on AWS security groups and addresses are relevant. If you want to provide a "scalable" pool/cluster of FTP servers, then the AWS Elastic Load Balancing and AWS Route53 sections will also be of interest.
Security Groups
Every EC2 instance belongs to one or more AWS Security Groups
(often abbreviated as simply "SGs"). As the AWS documentation states, a
"security group" is a effectively a set of firewall rules controlling network
access to your EC2 instance. I tend to think of SGs more like NAT rules,
since the "firewall" is the EC2 network perimeter managed by Amazon, and
an SG dictates what holes to allow from the outside world into the EC2 internal
networks.
Clients wishing to make a connection to the proftpd
running on
your EC2 instance, be it FTP, FTPS, SFTP, or SCP, will thus need to be allowed
to connect by one (or more) of your SGs. Assuming your proftpd
listens on the standard FTP control port (21), you would configure one of your
SGs to allow access to that port, from any IP address, using the AWS CLI like
so:
$ aws ec2 authorize-security-group-ingress \ --group-id sg-XXXX \ --protocol tcp \ --port 21 \ --cidr 0.0.0.0/0Note that you do not need to allow access to port 20! Many, many sites/howtos recommend opening port 20 in addition to port 21 for FTP access, but it simply not needed. For active data transfers (i.e. where the FTP server actively connects back to the client machine for the data transfer), the source port will be port 20. But incoming connections for FTP will never be to port 20.
If you are allowing SFTP/SCP connections, e.g. to your
proftpd
, running the mod_sftp
module on the standard SSH port (22):
$ aws ec2 authorize-security-group-ingress \ --group-id sg-YYYY \ --protocol tcp \ --port 22 \ --cidr 0.0.0.0/0Note: I recommend using different SGs for your FTP/FTPS rules and your SFTP/SCP rules. FTP/FTPS rules are more complex, and it is more clear to manage an SG named "FTP", with all of the related FTP rules, and separately to have an SG named "SFTP", with the SFTP/SCP related rules.
If you are only allowing SFTP/SCP access, that should suffice for the security group configuration for your instance. Allowing FTP/FTPS connections requires more security group tweaks.
FTP uses multiple TCP connections: one for the control connection, and separate other connections for data transfers (directory listings and file uploads/downloads). The ports used for these data connections are dynamically negotiated over the control connection; it is this dynamic nature of the data connections which causes complexity with network access rules. This site does a great job of describing these issues more in detail:
http://slacksite.com/other/ftp.htmlRemember how I said that SGs are similar to NAT rules? This similarity is one of the reasons why the ProFTPD NAT howto is relevant here as well.
We want to configure ProFTPD to use a known range of ports for its passive
data transfers, and then we want to configure our FTP SG to allow access to
that known port range. Thus we would use something like this in the
proftpd.conf
:
PassivePorts 60000 65535And then, to configure the SG to allow those ports:
$ aws ec2 authorize-security-group-ingress \ --group-id sg-XXXX \ --protocol tcp \ --port 60000-65534 \ --cidr 0.0.0.0/0The SFTP/SCP protocols only use a single TCP connection, and thus they do not require any other special configuration/access rules.
Public vs Private Instance Addresses
Every EC2 instance with have its own local/private IP address and DNS name,
automatically assigned by AWS. Instances may also be automatically
assigned public IP addresses/DNS names as well, depending on various
factors. The AWS docs on instance addressing discuss those
factors in greater detail.
If your EC2 instance will be supporting FTP/FTPS sessions, then you will need
to determine whether your instance has a public address. If so, that address
needs to be configured using the MasqueradeAddress
directive.
Why? When an FTP client negotiates a
passive data transfer,
ProFTPD tells that FTP client an address, and a port, to which to connect to
transfer the data. For EC2 instances with a public address, that public
address is what ProFTPD needs to convey to the FTP client, and the
MasqueradeAddress
is the directive that does so.
So how can you tell what the public address of your EC2 instance is, if it
even has one? You can use the EC2 instance metadata, via
curl
, like so:
$ curl http://169.254.169.254/latest/meta-data/public-hostnameIf your instance has a public address, the DNS name to use would be returned. Otherwise, you might see something like this:
$ curl http://169.254.169.254/latest/meta-data/public-hostname <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>404 - Not Found</title> </head> <body> <h1>404 - Not Found</h1> </body> </html>which indicates that your EC2 instances does not have a public address. And if your instance does not have a public address, then you do not need to use the
MasqueradeAddress
directive.
Here's one solution for handling this situation: obtain the public hostname
for your instance, store it in an environment variable, and then use that
environment variable in your proftpd.conf
:
$ export EC2_PUBLIC_HOSTNAME=`curl -f -s http://169.254.169.254/latest/meta-data/public-hostname`The
-f
option is necessary, in case the instance does not
have a public address. The -s
option simply makes for
quieter shell scripts. Then, in your proftpd.conf
, you might
use:
MasqueradeAddress %{env:EC2_PUBLIC_HOSTNAME}If the instance does not have a public address, though, that environment variable will be the empty string, and
proftpd
will fail to start
up because of that. Better would be to automatically handle the
"no public address" case, if we can. Assume you have a shell script for
starting proftpd
which does something like this, using our
EC2_PUBLIC_HOSTNAME
environment variable:
PROFTPD_ARGS="" # If we have a public hostname, then the string will not be # zero length, and we define a property for ProFTPD's use. if [ ! -z "$EC2_PUBLIC_HOSTNAME" ]; then PROFTPD_ARGS="$PROFTPD_ARGS -DUSE_MASQ_ADDR" fiThen, in your
proftpd.conf
, you use both that property
and the environment variable notation:
<IfDefined USE_MASQ_ADDR> MasqueradeAddress %{env:EC2_PUBLIC_HOSTNAME} </IfDefined>
Fortunately the EC2 instance addressing does not require any additional changes/tweaks to the AWS Security Groups.
Elastic Load Balancing
Now that you have ProFTPD up and running on your EC2 instance, and you can
connect using FTP/FTPS and SFTP/SCP, and browse directories and upload and
download files, you are probably thinking about how to have more than one
instance for your FTP service. After all, you want redundancy for your FTP
servers just like you have for your HTTP servers, right? And for HTTP servers,
you would use an AWS Elastic Load Balancer
(often called an "ELB"). Why not use the same technique for FTP? Can
you configure an ELB for FTP?
Yes, ELBs can be used for FTP. Like SGs, though, it's complicated by FTP's use of multiple TCP connections; for SFTP/SCP, ELBs are simpler to configure.
The first thing to keep in mind is that ELBs only distribute (i.e. "balance") connections in a round-robin fashion among the backend TCP servers; they do not distribute connections based on the load of those backend servers. (The balancing algorithm is slightly different for HTTP servers, but that does not apply to ProFTPD.) This means that any user might connect to any of your ProFTPD instances; this, in turn, means that users must be able to login on all instances, and that the files for all users should be available on all instances. These requirements lead to the requirements for centralized/shared authentication data, and for shared filesystems. The centralized/shared authentication data can be handled by using e.g. SQL databases, LDAP directories, or even synchronized password files. For shared filesystems, the popular approaches are:
There are probably other solutions as well; the key is to have the users' files available on any/every instance.The next thing to keep in mind is whether you have an EC2 Classic account, or whether you are using AWS VPC. Chances are that you are using a VPC. ELBs for an EC2 Classic account can only be configured to listen on a restricted list of ports, i.e.:
Let's assume that you are using a VPC, and thus you configure a TCP listener on your ELB for port 21, which uses the instance port 21. And for SFTP/SCP, it would be a TCP listener for port 22, using instance port 22. Obviously you would not use HTTP or HTTPS listeners, but what about an SSL listener, for FTPS? No. An SSL listener performs the SSL/TLS handshake first, then forwards the plaintext messages to the backend instance. But FTPS is a "STARTTLS" protocol, which means the connection is first unencrypted, and then feature negotiation happens on that connection, and then the SSL/TLS handshake happens. ELBs do not support STARTTLS protocols, thus you cannot use them for terminating SSL/TLS sessions for FTP servers.
Your ProFTPD configuration might use multiple different ports, for different
<VirtualHost>
s. Your ELB would need a different TCP
listener for each of those separate ports. However, now that ProFTPD supports
the FTP HOST
command (which allows for proper name-based virtual
hosts in FTP, just like HTTP 1.1 has via its Host
header), you
should only need on TCP listener now.
An ELB wants to perform health checks on its backend instances, to know that that instance is up, running, and available to handle connections. ELBs can perform HTTP requests as healthchecks, or make TCP connections. ProFTPD is not an HTTP server, so using TCP health checks is necessary. You would configure the ELB to make TCP connections to ProFTPD port, e.g. port 21 for FTP/FTPS, and/or port 22 for SFTP/SCP.
What about the range of ports defined via PassivePorts
, that
you had to allow in your SG? Does your ELB need TCP listeners for all of
those ports, too? No. To understand why, we need to examine in detail just
how passive data transfers work in FTP. An FTP client connects to your
FTP server, through the ELB, like this, for its control connection:
client --- ctrl ---> ELB:21 --- ctrl ---> instance:21The client and server negotiate a passive data transfer; the FTP server tells the client, over the control connection, an address and port to which to connect. Now, let's assume that ProFTPD gives the address of the ELB, and one of the
PassivePorts
; we'l use port 65000 for this example.
The FTP client connects to the address/port on the ELB, like this:
client --- data ---> ELB:65000 --- data ---> instance:65000This would mean that the ELB would need TCP listeners for the
PassivePorts
, and that MasqueradeAddress
would
need to point to the ELB DNS name. So why did I say that the ELB did not
need those extra TCP listeners?
If your ELB will only ever have just one backend instance, then the above configuration would work. Your EC2 instance might be in a VPC, with no public address, and thus perhaps the only way to make your FTP server there reachable is using an ELB. Where forcing passive data connections through an ELB starts to fail is when there are multiple backend instances. Consider the case where your ELB might have 3 instances:
+--> instance1:21 ELB:21 --|--> instance2:21 +--> instance3:21An FTP client connects to the ELB, and the ELB selects instance #2:
client --- ctrl ---> ELB:21 --- ctrl ---> instance2:21So far, so good. The client requests a passive data transfer; the FTP server tells the client to connect to the ELB address, port 65000, but the ELB sends that connection to instance #3, not instance #2:
client --- data ---> ELB:65000 --- data ---> instance3:65000This can happen because the ELB does not understand FTP; it does not know that the data connection is related, in any way, to any other connections. To the ELB, all TCP connections are independent, and thus any connection will be routed, round-robin, to any backend instance. There is no guarantee that the data connections, going through the ELB, will connect to the proper backend instance. If there is only one backend instance, though, everything will work as expected.
In order to properly support multiple backend instances (which is one of the
goals/benefits of using an ELB in the first place) for FTP, then, the trick
is to not force data connections through the ELB. Instead, the
MasqueradeAddress
directive points to each backend instance's
respective public hostname. With this configuration, the FTP client connects
to the ELB for its control connection, like usual:
client --- ctrl ---> ELB:21 --- ctrl ---> instance2:21And for the data transfer, ProFTPD tells the client the instance public hostname, and port 65000:
client -------------- data -------------> instance2:65000Notice how, with this configuration, the TCP connection for the data transfer bypasses the ELB completely. This is why you do not need to configure any TCP listeners on the ELB for those
PassivePorts
, and why
you do not want MasqueradeAddress
using the ELB DNS name;
you do not want passive data connections going through the ELB.
Now you have an ELB with multiple backend FTP servers. Success, right? Maybe.
There are some caveats. FTP clients might notice that they connect to
one name (the ELB DNS name), but for data transfers, they are being told
(by the FTP server) to connect to a different name; some FTP clients
might warn/complain about this mismatch. ProFTPD would definitely complain
about this mismatch, for it would see the control connection as originating
from the ELB, but the data connection originating from a different address,
and would refuse the data transfer. To allow data transfers to work, then,
you would need to add the following to your proftpd.conf
:
# Allow "site-to-site" transfers, since that is what FTP traffic with # an ELB looks like. AllowForeignAddress onwhich has its own security implications.
Next, there is the ELB idle timeout setting to adjust. The default is 60 seconds. During a data transfer, most FTP clients will be handling the data connection, and the control connection is idle. Thus if the data transfer lasts longer than 60 seconds, the ELB might terminate the idle control connection, and the FTP session is lost. Unfortunately the maximum allowed idle timeout for ELBs is 1 hour (3600 seconds); for large (or slow) data transfers, even that timeout could be a problem. There are ways of keeping the control connection from being idle for too long, using keepalives. Note that this idle timeout is not really an issue for SFTP/SCP sessions, as all data transfers for them use the same single TCP connection.
Last, using an ELB only for FTP control connections, and using direct
connections for the FTP data transfers only works if your backend EC2 instances
have public hostnames; for instances in a VPC, that may not be true.
So how can we use an ELB for multiple backend instances that only have private
addresses? Sadly, the answer is: you can't. For load balancing FTP sessions
among multiple backend EC2 instances with private addresses, you need an
FTP-aware proxy, such as ProFTPD with the mod_proxy
module. This means running your
own instance for doing that load balancing, rather than having AWS manage it.
Of course, if the clients using your ELB for FTP services are also
within your VPC, then the lack of public hostnames for your EC2 instances
is not an issue, and using an ELB as described above will work.
DNS and AWS Route53
Instead of using an AWS ELB for balancing/distributing connections across
your pool of ProFTPD-running instances, you can use DNS tricks to implement
the same functionality. Note, however, these DNS tricks still assume that
your EC2 instances are publicly reachable, i.e. have public hostnames.
With DNS load balancing, the client resolves a DNS name to an IP address,
and connects to that IP address:
Within AWS, the Route53 service
can be used as the DNS service for your domain names. AWS Route53 calls this round robin of addresses a weighted routing
policy, as each address associated with a name can be given a "weight",
affecting the probability that that address will be returned, by Route53,
when the DNS name is resolved to an IP address. Other routing policies are
supported, e.g. latency-based routing
(so that the instance with the fastest response time is chosen), and
geolocation-based routing (the instance address
chosen is based on the location of the resolving client).
If you are using AWS Route53, then you will need to configure health checks,
just as you would for an ELB. Route53 supports TCP health checks, which
you would point at your FTP/FTPS port (21) or SFTP/SCP port (22) on your
instances.
Since any/all clients could connect to any/all of the EC2 instances associated
with your DNS name, all of the users would need to be able to login on any
instance, and have their files/data available. Thus using a shared filesystem
for the files (such as s3fs, NFS, Samba, gluster, etc) and a centralized/shared authentication
mechanism (e.g. SQL database, LDAP directory, etc) would be
needed.
Future Work
Frequently Asked Questions
Question: I need to send particular users only to
a particular instance/set of instances. How do I configure AWS to do this?
The AWS services like ELBs and Route53 understand TCP connections, and the
HTTP protocol, but they do not understand FTP. And understanding of the
protocol is necessary, so that you know how/when to expect the user name, and
how to redirect/proxy the backend connection. This is why you cannot use
AWS to do per-user balancing. However, you can use the
Question: I am using ELBs for my pool of ProFTPD
servers. I would like my logs to show the IP address of the connecting
clients, but all I get is the IP address of the ELB. Is there a way to get
the original IP address, an equivalent to the
To enable use of the
The
Using an ELB for balancing connections across your pool of FTP servers is
rather complex. Are there alternatives? Yes: "DNS load balancing".
client1 ----------------- ctrl ----------------> instance2:21
client1 ----------------- data ----------------> instance2:65000
But the DNS server might be configured with several IP addresses for
the same DNS name; the client then chooses one IP address from the given
list (usually the first address), and connects to that. Some DNS servers will
shuffle the list of returned addresses for a name, so that clients will
choose different addresses, and thus distribute/balance their connections
across all of the addresses:
client1 ----------------- ctrl ----------------> instance2:21
client1 ----------------- data ----------------> instance2:65000
client2 ----------------- ctrl ----------------> instance1:21
client2 ----------------- data ----------------> instance1:65000
client3 ----------------- ctrl ----------------> instance3:21
client3 ----------------- data ----------------> instance3:65000
In order to automate much of the above manual steps, work is progressing on
a mod_aws
module for ProFTPD, which will eventually:
in addition to other interactions with AWS services.
PassivePorts
for FTP/FTPS vhost, if needed
MasqueradeAddress
if needed
Answer: Short answer: you cannot. But it can
be done!
mod_proxy
module for ProFTPD, which is protocol-aware, and thus can balance
FTP/FTPS connections in multiple ways, including per-user.
X-Forwarded-For
HTTP header?
Answer: Yes, there is an equivalent mechanism
that is supported by ELBs for TCP listeners: the PROXY protocol.
PROXY
protocol by your ELB, see here. You will also need to tell ProFTPD to expect
the PROXY
protocol, which means using the mod_proxy_protocol
module.
PROXY
protocol, and the mod_proxy_protocol
module,
work equally well for FTP/FTPS and SFTP/SCP sessions.