Migrate a Rails Project from Paperclip to ActiveStorage
Intro
Paperclip was an infamous gem in the Rails world which provided file upload capabilities for Rails projects. The gem was deprecated shortly after the release of Rails 5.2, which included a similar built-in capability known as ActiveStorage. This guide is intended to help with migrating a Paperclip-enabled Rails project to instead use ActiveStorage (specifically with Amazon S3 storage).
Table of Contents
Summary
The Paperclip to ActiveStorage migration will require 2 Git branches which will need to be deployed in separate phases to your production/staging environment. Here is a summary of what will be covered in this guide:
Branch 1
- Enable Active Storage
- Migrate Paperclip Data
Branch 2
- Update Config to use ActiveStorage
- Update Models, Views, and Controllers
- Migrate Attachment Files
- Code Cleanup
Deploy the Migration
- Test!
- Deploy Branch 1
- Deploy Branch 2
Assumptions
This guide assumes the following:
- Your project is running on or can easily be upgraded to Rails 5.2 (Not tested in Rails 6)
- Your project uses Postgres as the database (Not tested on other databases)
- Your project uses Paperclip 6.0.0
- Your project uses Amazon S3 for Paperclip storage (although the steps can easily be modified for other services or local storage)
- You are responsible for appropriately testing your codebase before, during, and after going down this migration path :)
Let's Migrate!
This migration will require 2 branches and a phased deployment to production/staging. In the first branch, ActiveStorage will be enabled and the existing Paperclip data will be migrated into ActiveStorage format. Because the migration script relies on Paperclip functions, this branch will need to be kept separate and deployed first before Paperclip is removed from the project.
In the second branch, ActiveStorage will be activated, models and views will be updated, and the actual attachments will be copied to the ActiveStorage directory structure.
Create Branch 1
1. Enable ActiveStorage
The first step to migrating to ActiveStorage is to enable it in the project. This consists of making sure it's included in the application config, as well as running a rake task to generate a migration which will create the tables necessary for ActiveStorage.
Include ActiveStorage in Application Config
# config/application.rb
require 'active_storage/engine'
Alternatively, if your application config contains:
# config/application.rb
require 'rails/all'
... then there's nothing you need to change here, you've already included all Rails components :)
Generate and Run ActiveStorage Migration
$ rails active_storage:install
This will generate the migration for the tables needed by ActiveStorage which will include two tables:
active_storage_attachements
- holds information about how the attachment relates to a modelactive_storage_blobs
- holds information about the attachment itself, such as file type, filename, checksum, etc.
Now, run this migration to create the tables.
$ rails db:migrate
Now that the structure for ActiveStorage is in place, the next step will be to move the data produced by Paperclip file uploads to the new ActiveStorage structure.
2. Migrate Paperclip Data
This step will consist of creating and running a Rake task which will copy all of the data generated from Paperclip file uploads into the new format required by ActiveStorage.
While Paperclip adds attachment columns directly to the Model tables, such as avatar_file_name, avatar_content_type, etc, ActiveStorage stores this information in the two dedicated tables generated in step 1.
Create a Rake Task
In the lib/tasks
folder, create a new file named migrate_paperclip_data.rake
Paste the following code into the file:
# lib/tasks/migrate_paperclip_data.rake
require 'open-uri'
namespace :migrate_paperclip do
desc 'Migrate the paperclip data'
task move_data: :environment do
# Prepare the insert statements
prepare_statements
# Eager load the application so that all Models are available
Rails.application.eager_load!
# Get a list of all the models in the application
models = ActiveRecord::Base.descendants.reject(&:abstract_class?)
# Loop through all the models found
models.each do |model|
puts 'Checking Model [' + model.to_s + '] for Paperclip attachment columns ...'
# If the model has a column or columns named *_file_name,
# We are assuming this is a column added by Paperclip.
# Store the name of the attachment(s) found (e.g. "avatar") in an array named attachments
attachments = model.column_names.map do |c|
Regexp.last_match(1) if c =~ /(.+)_file_name$/
end.compact
# If no Paperclip columns were found in this model, go to the next model
if attachments.blank?
puts ' No Paperclip attachment columns found for [' + model.to_s + '].'
puts ''
next
end
puts ' Paperclip attachment columns found for [' + model.to_s + ']: ' + attachments.to_s
# Loop through the records of the model, and then through each attachment definition within the model
model.find_each.each do |instance|
attachments.each do |attachment|
# If the model record doesn't have an uploaded attachment, skip to the next record
next if instance.send(attachment).path.blank?
# Otherwise, we will convert the Paperclip data to ActiveStorage records
create_active_storage_records(instance, attachment, model)
end
end
puts ''
end
end
end
private
def prepare_statements
# Get the id of the last record inserted into active_storage_blobs
# This will be used in the insert to active_storage_attachments
# Postgres
get_blob_id = 'LASTVAL()'
# mariadb
# get_blob_id = 'LAST_INSERT_ID()'
# sqlite
# get_blob_id = 'LAST_INSERT_ROWID()'
# Prepare two insert statements for the new ActiveStorage tables
ActiveRecord::Base.connection.raw_connection.prepare('active_storage_blob_statement', <<-SQL)
INSERT INTO active_storage_blobs (
key, filename, content_type, metadata, byte_size, checksum, created_at
) VALUES ($1, $2, $3, '{}', $4, $5, $6)
SQL
ActiveRecord::Base.connection.raw_connection.prepare('active_storage_attachment_statement', <<-SQL)
INSERT INTO active_storage_attachments (
name, record_type, record_id, blob_id, created_at
) VALUES ($1, $2, $3, #{get_blob_id}, $4)
SQL
end
def create_active_storage_records(instance, attachment, model)
puts ' Creating ActiveStorage records for [' +
model.name + ' (ID: ' + instance.id.to_s + ')] ' +
instance.send("#{attachment}_file_name") +
' (' + instance.send("#{attachment}_content_type") + ')'
build_active_storage_blob(instance, attachment)
build_active_storage_attachment(instance, attachment, model)
end
def build_active_storage_blob(instance, attachment)
# Set the values for the new ActiveStorage records based on the data from Paperclip's fields
# for active_storage_blobs
created_at = instance.updated_at.iso8601
blob_key = key(instance, attachment)
filename = instance.send("#{attachment}_file_name")
content_type = instance.send("#{attachment}_content_type")
file_size = instance.send("#{attachment}_file_size")
file_checksum = checksum(instance.send(attachment))
blob_values = [blob_key, filename, content_type, file_size, file_checksum, created_at]
# Insert the converted blob record into active_storage_blobs
insert_record('active_storage_blob_statement', blob_values)
end
def build_active_storage_attachment(instance, attachment, model)
# Set the values for the new ActiveStorage records based on the data from Paperclip's fields
# for active_storage_attachments
created_at = instance.updated_at.iso8601
blob_name = attachment
record_type = model.name
record_id = instance.id
attachment_values = [blob_name, record_type, record_id, created_at]
# Insert the converted attachment record into active_storage_attachments
insert_record('active_storage_attachment_statement', attachment_values)
end
def insert_record(statement, values)
ActiveRecord::Base.connection.raw_connection.exec_prepared(
statement,
values
)
end
def key(_instance, _attachment)
# Get a new key
SecureRandom.uuid
# Alternatively:
# instance.send("#{attachment}").path
end
def checksum(attachment)
# Get a checksum for the file (required for ActiveStorage)
# local files stored on disk:
# url = "#{Rails.root}/public/#{attachment.path}"
# Digest::MD5.base64digest(File.read(url))
# remote files stored on a cloud service:
url = attachment.url
Digest::MD5.base64digest(Net::HTTP.get(URI(url)))
end
The rake task code is quite long, but I've commented thoroughly so it is clear how it all works. I have also added some logging messages so useful information is displayed while running the rake task.
Run the Rake Task
Next, run the rake task from a terminal window:
$ rails migrate_paperclip:move_data
Checking Model [User] for Paperclip attachment columns ...
No Paperclip attachment columns found for [User].
Checking Model [Account] for Paperclip attachment columns ...
No Paperclip attachment columns found for [Account].
Checking Model [Transaction] for Paperclip attachment columns ...
Paperclip attachment columns found for [Transaction]: ["receipt"]
Creating ActiveStorage records for [Transaction (ID: 102)] 2019-07-02_Dusty_Lindgren.png (image/png)
Creating ActiveStorage records for [Transaction (ID: 103)] 2019-07-08_Mac_Yost.png (image/png)
Creating ActiveStorage records for [Transaction (ID: 104)] 2019-06-22_Mrs._Kerrie_Anderson.png (image/png)
Creating ActiveStorage records for [Transaction (ID: 105)] 2019-07-08_Elvis_O'Conner.png (image/png)
Creating ActiveStorage records for [Transaction (ID: 106)] 2019-07-19_Jessica_Kunde.png (image/png)
As stated before, I included some logging statements to the rake script which can be seen here. The rake task will go through each model of the application, look for a Paperclip attachment, and then copy any attachment information found to the new ActiveStorage tables. If you check the 2 ActiveStorage tables using a DB client, you should now see the Paperclip data that was migrated to ActiveStorage format.
Now that the ActiveStorage data is in place, the next step is to create a new branch (from the current branch), and update the application config, models, and views to use ActiveStorage.
Create Branch 2
In the second branch, the config files, models, and views will be updated to use ActiveStorage. Additionally, the attachments will need to be relocated to a new directory path structure on the Amazon S3 bucket.
1. Update Config Files
To make the rails app use ActiveStorage, the config files need to be updated as follows. As stated before, this guide is focused specifically on S3 storage, so adjust to your needs.
Add Storage Definitions to storage.yml
Replace the values with however you are storing your application secrets. I have a separate S3 bucket set up for development and test instances, which is why there are two storage definitions defined here.
# config/storage.yml
amazondev:
service: S3
access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
region: <%= ENV['S3_REGION'] %>
bucket: <%= ENV['S3_BUCKET_NAME_DEV'] %>
amazon:
service: S3
access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
region: <%= ENV['S3_REGION'] %>
bucket: <%= ENV['S3_BUCKET_NAME'] %>
Update Environment Config Files
Next, the environment config files need to be updated to use the storage definitions from storage.yml. In this example, test.rb
will be identical to development.rb
, so it's omitted from the guide.
# config/environments/development.rb
config.active_storage.service = :amazondev
# # Paperclip settings
# config.paperclip_defaults = {
# storage: :s3,
# s3_credentials: {
# bucket: ENV['S3_BUCKET_NAME_DEV'],
# access_key_id: ENV['AWS_ACCESS_KEY_ID'],
# secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
# s3_region: ENV['S3_REGION'],
# s3_host_name: ENV['S3_HOST_NAME']
# },
# s3_protocol: 'https'
# }
# config/environments/production.rb
config.active_storage.service = :amazon
# # Paperclip settings
# config.paperclip_defaults = {
# storage: :s3,
# s3_credentials: {
# bucket: ENV['S3_BUCKET_NAME'],
# access_key_id: ENV['AWS_ACCESS_KEY_ID'],
# secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
# s3_region: ENV['S3_REGION'],
# s3_host_name: ENV['S3_HOST_NAME']
# },
# s3_protocol: 'https'
# }
Note that the Paperclip config settings have been commented out as they are being replaced by the ActiveStorage config. They can be safely removed later after testing.
Now that ActiveStorage has been activated, it's time to update the application's Models, Views, and Controllers.
2. Update Models, Views, and Controllers
Please refer to the ActiveStorage documentation for full details, but generally speaking, this is a straightforward process in which you should be changing the following pieces of code:
Models
has_attached_file :attachment
has_one_attached :attachment
ActiveStorage does not presently support native validations, though there are workarounds for this which can be found with some simple searching. If you are using Paperclip's
validates_attachment_content_type
you will want to either disable this for now, or look into a workaround before proceeding.
If you are using multiple attachments in Paperclip via a helper Model/Controller, ActiveStorage contains an alternative which will negate the need for a helper Model:
has_many_attached :attachments
Views
exists? becomes attached?
@model.attachment.exists?
@model.attachment.attached?
url(:size) becomes variant(resize: "dimensions")
@model.attachment.url(:thumb)
@model.attachment.variant(resize: "100x100")
.url becomes url_for()
@model.attachment.url
url_for(@model.attachment)
There are some other handy fields you can reference such as filename, and byte_size:
@model.attachment.filename
@model.attachment.bite_size
Controllers
There shouldn't be anything to update in a basic Controller as long as the attachment is listed in the strong parameters:
def user_params
params.require(:user) \
.permit(:username, :email, :avatar)
end
After updating the models, views, and controllers to use ActiveStorage, the Paperclip attachments need to be copied to a directory structure that ActiveStorage understands.
3. Migrate Attachments
A new rake task needs to be created to migrate the Paperclip attachments from a Paperclip directory structure to an ActiveStorage directory structure. A rake task will be created that reads the attachment information from the Paperclip fields, generates the Amazon S3 URL for each record's attachment, and then copies it to the new ActiveStorage location on S3.
In the lib/tasks
folder, create a new file named migrate_paperclip_attachments.rake
Paste the following code into this file:
# frozen_string_literal: true
namespace :migrate_paperclip do
desc 'Migrate the paperclip attachments'
task move_attachments: :environment do
# Eager load the application so that all Models are available
Rails.application.eager_load!
# Get a list of all the models in the application
models = ActiveRecord::Base.descendants.reject(&:abstract_class?)
# Loop through all the models found
models.each do |model|
puts 'Checking Model [' + model.to_s + '] for Paperclip attachment columns ...'
errs = []
err_ids = []
# If the model has a column or columns named *_file_name,
# We are assuming this is a column added by Paperclip.
# Store the name of the attachment(s) found (e.g. "avatar") in an array named attachments
attachments = model.column_names.map do |c|
Regexp.last_match(1) if c =~ /(.+)_file_name$/
end.compact
# For each attachment on the model, migrate the attachments
attachments.each do |attachment|
migrate_attachment(attachment, model, errs, err_ids)
end
next if errs.empty?
# Display records that have errors
puts ''
puts 'Errored attachments:'
puts ''
errs.each do |err|
puts err
end
# Display list of errored attachment IDs
puts ''
puts 'Errored attachments list of IDs (use for SQL statements)'
puts err_ids.join(',')
puts ''
end
end
end
private
def migrate_attachment(attachment, model, errs, err_ids)
model.where.not("#{attachment}_file_name": nil).find_each do |instance|
# Set the S3 Bucket based on environment
bucket = Rails.env.production? ? ENV['S3_BUCKET_NAME'] : ENV['S3_BUCKET_NAME_DEV']
region = ENV['S3_REGION']
# Set attachment details
instance_id = instance.id
filename = instance.send("#{attachment}_file_name")
extension = File.extname(filename)
content_type = instance.send("#{attachment}_content_type")
original = CGI.unescape(filename.gsub(extension, "_original#{extension}"))
puts ' [' + model.name + ' (ID: ' +
instance_id.to_s + ')] ' \
'Copying to ActiveStorage location: ' + original
# Paperclip stores attachments in a directory structure such as:
# 000/000/001 = Instance ID 1
# 000/050/250 = Instance ID 50250
# 999/999/999 = Instance ID 999999999
# We need to build the appropriate path to get the correct URL for the attachment
instance_path = instance_id.to_s.rjust(9, '0')
instance_path = instance_path.scan(/.{1,3}/).join('/')
# Build the S3 URL
url = "https://#{bucket}.s3.#{region}.amazonaws.com/#{model.name.downcase.pluralize}/#{attachment.pluralize}/#{instance_path}/original/#{filename}"
# puts ' ' + url
# Copy the original Paperclip attachment to its new ActiveStorage location
# For debugging/testing purposes, comment out this section and print the URL to log to verify the correctness
begin
instance.send(attachment.to_sym).attach(
io: open(url),
filename: filename,
content_type: content_type
)
rescue StandardError => e
puts ' ... error! ...'
errs.push("[#{model.name}][#{attachment}] - #{instance_id} - #{e}")
err_ids.push(instance_id)
end
end
end
Again, I have commented this code so it's easier to understand. This rake task will loop through all models in the application, and for each model containing Paperclip attachment columns, it will then search the model's records for those with attachments. When a record with an attachment is found, the rake task will then generate the URL of the attachment, and then re-attach that file to the record with the now-enabled ActiveStorage methods, which essentially copies the file on the S3 bucket from the Paperclip path to the ActiveStorage path.
You may need to modify the URL portion depending on your application environment and S3 settings.
When ready to test, simply execute:
$ rails migrate_paperclip:move_attachments
You will see some output such as:
Checking Model [User] for Paperclip attachment columns ...
Checking Model [Account] for Paperclip attachment columns ...
Checking Model [Transaction] for Paperclip attachment columns ...
[Transaction (ID: 102)] Copying to ActiveStorage location: 2019-06-27_Janett_Stamm_II_original.png
[Transaction (ID: 103)] Copying to ActiveStorage location: 2019-07-05_Douglass_Waters_original.png
[Transaction (ID: 104)] Copying to ActiveStorage location: 2019-06-27_Lore_O'Reilly_original.png
[Transaction (ID: 105)] Copying to ActiveStorage location: 2019-07-09_Rosamaria_Wehner_original.png
[Transaction (ID: 106)] Copying to ActiveStorage location: 2019-06-25_Jacquelyne_Crona_original.png
[Transaction (ID: 1107)] Copying to ActiveStorage location: 2019-06-27_Mr._Lacy_Olson_original.png
[Transaction (ID: 1108)] Copying to ActiveStorage location: 2019-07-14_Adria_Cormier_original.png
[Transaction (ID: 1109)] Copying to ActiveStorage location: 2019-06-27_Daron_Considine_original.png
[Transaction (ID: 1110)] Copying to ActiveStorage location: 2019-07-10_Blair_Walsh_PhD_original.png
[Transaction (ID: 1111)] Copying to ActiveStorage location: 2019-06-29_Keila_Nicolas_original.png
For reference, the URLs generated by this script for my application look like this:
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/000/102/original/2019-06-27_Janett_Stamm_II.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/000/103/original/2019-07-05_Douglass_Waters.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/000/104/original/2019-06-27_Lore_O'Reilly.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/000/105/original/2019-07-09_Rosamaria_Wehner.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/000/106/original/2019-06-25_Jacquelyne_Crona.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/001/107/original/2019-06-27_Mr._Lacy_Olson.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/001/108/original/2019-07-14_Adria_Cormier.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/001/109/original/2019-06-27_Daron_Considine.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/001/110/original/2019-07-10_Blair_Walsh_PhD.png
https://#{bucket}.s3.#{region}.amazonaws.com/transactions/attachments/000/001/111/original/2019-06-29_Keila_Nicolas.png
Note on Error Handling
You will notice that some error handling is included in the above rake task which will capture and collect any erroneous records that occur when trying to copy the attachment file. When running the rake task, the IDs of those records will be printed to the console at the end, along with the error message. Additionally, a comma separated list of the instance IDs will also be printed which can easily be copied for use in SQL statements.
When running the rake task for my application, I had many errors due to the actual file missing from the S3 bucket. I cannot explain why this happened, but the database was telling the script there is an attachment when the attachment file did not exist.
The workaround for this (at least for me since the attachments were not critical to recover) was to do some manual data cleanup. I copied the list of erroneous record IDs, and opened up pgAdmin.
I first queried active_storage_attachments
filtering on the list of erroneous record IDs and on the record type:
SELECT blob_id
FROM active_storage_attachments
WHERE record_type = 'Transaction'
AND record_id IN ( ... comma-separated list of erroneous record IDs ... )
I saved this list of blob_id
values for later use - they must be deleted but cannot be deleted until the active_storage_attachments
records are deleted due to an FK constraint.
Next, I deleted the records in active_storage_attachments
from the first query:
DELETE
FROM active_storage_attachments
WHERE record_type = 'Transaction'
AND record_id IN ( ... comma-separated list of erroneous record IDs ... )
Then, I was able to remove the records in active_storage_blobs
:
DELETE
FROM active_storage_blobs
WHERE id = ( ... comma-separated list of blob_id values from
Finally, as an optional step, I set the original Paperclip fields for these records to NULL:
UPDATE transactions
SET attachment_file_name = NULL,
attachment_content_type = NULL,
attachment_file_size = NULL,
attachment_updated_at = NULL
WHERE id IN ( ... comma-separated list of erroneous record IDs ... )
Hopefully you will not run into any issues, but I wanted to share this step for real-world expectations :)
If the rake task was successful, our work here is almost finished. It's time to remove the last bits of Paperclip from the application.
4. Code Cleanup
In this final step, the Paperclip gem can be removed from the application's Gemfile, and any commented out Paperclip code can also be removed. I'm assuming you've at least made sure everything is working in Development by this point, right? Go ahead and remove any Paperclip references! :)
Deploy the Migration
As described before, this migration needs to be done in phases due to the reliance on Paperclip functions in the first rake task. Much of this will be left to you depending on your deployment methodologies and CI setup, but these steps should be adaptable to any situation.
1. Test!
Testing is always important in any application change, much less a large change such as this. Before migrating to production, make sure to do lots of unit testing, update your RSpec tests, and deploy to a staging/test server first. I would also recommend backing up/duplicating your production S3 bucket prior to performing the deployment.
2. Deploy Branch 1
The first step in deployment will be to deploy branch 1 to production, run the migrations, and then the rake task:
$ rails migrate_paperclip:move_data
3. Deploy Branch 2
Next, the second branch needs to be deployed to production, and then the second rake task can be run:
$ rails migrate_paperclip:move_attachments
Once finished, the old Paperclip S3 folder can be removed and your application is now on ActiveStorage!
References
- ActiveStorage, Rails Guides
- Migrating from Paperclip to ActiveStorage, thoughtbot (Mike Burns)
- Migrating from Paperclip to ActiveStorage, Leonardo Negreiros