Thursday 18 October 2018

Cloud Usage Guidance



Cloud environments are great - they enable us to do things faster, experiment, work collaboratively, etc, etc. However, without care, cloud environments can also become security risks and burn through cash quickly.


Two of the most important principals are:
Security: Apply a least-privilege approach to cloud resources, i.e. only allow access to the people that need it. This means consideration of ports, IP addresses and user permissions that need to be granted access
Cost: Understand the cost of resources that you're creating and ensure they are destroyed when they're no longer required and, ideally, turned off or scaled-down (vertically and horizontally) when not in use

The following sections provide some guidance to consider when creating and using cloud resources. The principals behind the recommendations are agnostic of the cloud provider, albeit the implementation might differ slightly for each and examples given are more focused on AWS.
The intention is for everyone to be aware of the various considerations that apply to the use of cloud environments, even if you're just spinning up a single VM to learn about the cloud.

Security

We are not re-inventing the wheel, as there are many existing security practices for AWS and Azure, e.g. for AWS, it's worth checking these out:
https://d0.awsstatic.com/whitepapers/aws-security-best-practices.pdf
http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html

Network Security Considerations

It is critical that network security controls, especially Security Groups (in AWS) and Network Security Groups (in Azure), are implemented correctly and consistently to avoid creating holes, or vulnerabilities, in the security posture of your environments. This will also ensure that in the nightmare scenario of a security breach of a VM, the damage can be as limited as possible.

A few considerations, which mostly relate to IaaS, but should be considered for all environments are:
  • Traffic should only be permitted to travel within and between networks on a least-privilege basis, particularly for those environments that contain client data or are connected to a client  network, only granting access to those security groups, IPs and protocols that are known, understood and trusted
  • There should never be any Security Groups created with the default 0.0.0.0/0 for ALL protocols for outbound traffic. This applies to public and private subnets. For example, if a VM was hacked, it would be harder (i.e. slow them down, if not stop them) for the attacker to jump from that VM to another or use it for other nefarious activities, if non-essential outbound ports are not permitted - considering that SG rules are stateful, it's unlikely that you'll need outbound SSH (port 22) permitted on any server other than a jump/bastion server.
  • Routing tables should ensure that traffic is only routed between subnets and VPCs that require it. Where possible, Security Groups should reference other Security Groups, rather than IP addresses directly when setting rules for the same or peered VPCs
  • Inbound security group rules should IP whitelist all traffic unless there is a very strong reason not to,
    AND a comment should be added to describe exactly what that IP refers to. Usually, all traffic originating from a Client's office network or the client VPN will have the same IP address, so this should be used on all externally accessible ports/endpoints.
  • In a more mature environment, it is preferable for security group rules to be scripted and regularly reapplied/enforced via automation to force people to formalise and document network access in scripts any avoid manually added rules that could adversely affect the security posture of the environment.
  • When designing environments, or even doing PoCs with any kind of sensitive data or other information, use subnets to segregate public and private networks and consider how the servers and services are accessed, e.g. via jump/bastion servers, rather than directly connecting all servers to the internet (in an ideal world, automation of deployment, configuration, log access, etc would eliminate virtually all need for a user to log on to a server)

IAM Best Practices

VPC Best Practices

  • Always assign security groups to instances
  • Consider using existing security groups before creating new ones, however, only re-use SGs across similar instances/services to avoid confusion and inadvertently adversely affecting the security of another instance by changing a shared SG
  • For Security Groups, open only ports you really need to be opened and restrict access to these ports
  • Avoid exposing instances to the Internet whenever possible
  • Consider enabling Flow Logs (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs.html)
  • Restrict egress traffic, avoid allowing outbound connections for any port to anywhere

EC2 Best Practices

  • For opening access to the instance from the Internet use Elastic Load Balancer
  • Always use HTTPS for publicly exposed HTTP-type services
  • Enable access logs collection for your ELBs (https://aws.amazon.com/blogs/aws/access-logs-for-elastic-load-balancers/)
  • For inter-service communication always route traffic only via a private network (meaning either instanceA→internal ELB→ instanceB or instanceA→InstanceB, but no instanceA→Public ELB→InstanceB)
  • Use bastion servers to access instances via SSH/RDP

S3 Best Practices

Cost

  • Some cloud services are effectively free with low usages, such as AWS Lambdas and Azure Functions, but others aren't. Therefore, please be mindful of the cost of the resources being created. The considerations should include:
  • The size of the resource (e.g. for a VM, it's the number of CPUs and size of RAM; for disks, it's the type/speed and capacity)
  • Whether the resource is charged all the time (see below)
  • Whether the resource can be scaled down / turned off, when not in use
  • Whether there are more cost-effective ways of achieving the same thing
This does not mean don't try or use certain services, it just means be mindful of the consequences.

AWS and Azure provide comprehensive pricing pages and calculators to help with this.
For example, for VMs: As mentioned above, the pricing models for cloud resources vary, here are a few different types:
  • Charged whilst running - VMs are a good example of this (but their storage is usually billable even when the VM is shut down)
  • Charged when used - Lambdas / Functions are a good example
  • Charged until destroyed - RDS is a good example
It is also worth checking whether there are any free credits that are available.