Our online education site edu2.0 currently has 41,000 users and we're projecting at least 100,000 users by the end of 2009. As a result, we've decided to migrate our user file storage to Amazon S3. Things like user account information, lessons, quizzes and quiz results are still stored in MySQL, but things like videos, audio files, powerpoints are moving to S3.
We currently have 80,000 user files which take up a total of 30GB. The S3 pricing model means that we'll be paying a total of about $5 a month for storage and a few more dollars a month for serving up those files. So the cost to us of moving to S3 is minimal.
Here's a synopsis of what we've done to far to move to S3.
1. We signed up with Amazon AWS.
2. We use the free Cloudberry Explorer as a user interface into S3.
3. We use the free AWS::S3 Ruby gem to access the S3 API from Ruby.
4. We built a "StorageSystem" abstraction with two concrete implementations, "FileSystem" and "AmazonS3". We migrated our code base over to use the FileSystem implementation of this abstraction in preparation for moving to the AmazonS3 implementation.
5. Our system already maintains a table that stores information for every user file, including name, size, owner, and access permissions. We added a couple of new boolean fields to record whether the file was stored via FileSystem, AmazonS3, or both. We set FileSystem=true and AmazonS3=false for all existing files.
6. We wrote a script that periodically copies 1000 user files from our file system to S3. For each success, we set the AmazonS3=true in the database. This approach allowed us to incrementally copy all 80,000 files from the file system to S3 in a couple of days.
7. Before moving 100% to S3, we added a little code so that when a user uploaded a file, it stored the file into the file system *and* Amazon S3. This allowed us to make sure that uploading to S3 was working wel while still relying on the existing file system for retrieval.
8. All user files were already retrieved via an HTTP GET to our Rails servers. In our original file-based implementation, we look up information about the file in our database and check access permissions. If access is granted, we use the Rails send_file method to stream the file to the user. In the S3 implementation, we check the permissions and if access is granted, we obtain a signed URL for the S3 copy of the file and then redirect the HTTP GET request to the Amazon S3 URL. By default, the signed URL expires after 5 minutes. One nice thing about the S3 implementation is that our servers no longer have to serve up files; all they do is check access permissions and redirect to S3.
So far, everything is going great. It took a total of about 10 hours to go through the learning curve, add the file system abstraction, write the scripts, and add the migration code. If we don't find any glitches, we'll switch over to S3 for all operations in a week or so. I'll follow up with a final report.
Recent Comments