You have to make sure your cloud application is not brittle. Make your components more resistant to failure. Bridges can withstand more traffic than their largest anticipated load. Since you can add and remove resources in a cloud computing environment, your margin of safety can expand or contract as your load expands or contracts. Nonetheless, adding and removing resources is not instantaneous. You have to make sure that your system can handle a "normal load".
How do you determine your margin of safety?
Look at every resource you use in the system: database sizes, bandwidth, virtual memory, CPU, network latencies, and the response times of your software and your third party components. See how they respond under various types of commands, reports, and queries over time.
Because of the economic costs, and possible performance hits with handling failure, you want to ensure your application in its normal state of operations can handle the load. You might want to factor in some likely scenarios, for instance, and make the resources required larger than might be ordinarily needed.
Make sure all errors are handled, even unlikely ones. Return clear error codes that indicate what the problem is to the best of your ability. When problems occur, you might degrade performance rather than eliminate functionality. Determine what functionality is essential and what is not. During the Amazon outage last year Netflix turned off personalized movie lists, but you could still get lists of movies and play them.
Make reasonable SLA promises to your customers. So the UI can scale properly, Amazon sends confirmation emails for book orders.
A chain is as strong as its weakest link. If your web front end has limited capacity, or you run out of TCP/IP ports, it does not matter how strong your database server is.
Use a Margin of Safety when determining the resources needed for your application.