ROR技能树：批量生成测试数据

使用 seeds 和 rake task 生成测试数据。

写在前面

使用ROR进行网站开发时，不可避免地需要生成测试数据，常用的方法有两种：

使用seed.rb
使用rake task

下面以 Job这个model为例，看看如何用seed，rake task来批量生成测试数据，并对比两者之间的优劣。

Job所含字段：

string   "title"
text     "description"
datetime "created_at",                    
datetime "updated_at",                    
integer  "wage_upper_bound"
integer  "wage_lower_bound"
string   "contact_email"
boolean  "is_hidden

测试数据生成

Seeds.rb

在seeds.rb中，将不同的字段放在Array中，然后循环调用这些Array，生成测试数据。

比如下面这种，将Job的title， description 放在数组jobs_info中，然后循环赋值，具体步骤如下：

1、将如下代码贴到seed.rb文档中（为增加可读性，此处jobs_info仅放了3个种子）：

jobs_info = [["Web Application Developer","Creates, maintains and implements web-based application systems. Resolves issues and recommends enhancements, when necessary. Has knowledge of HTML, Java and related concepts. Relies on knowledge and professional discretion to plan and accomplish goals. Usually reports to a department head. Significant ingenuity and flexibility is expected. May require a bachelor’s degree in a related area and at least 2-4 years of relevant experience."
],[ "Android Developer","Designs and builds applications for the Android platform. Works with outside data sources and API’s. Fixes bugs and improves application performance. Collaborates with cross-functional teams to determine and launch new features. Should have knowledge of core web technologies (HTML5, CSS3, JavaScript). Requires a bachelor’s degree in area of specialty and 2 years of relevant experience."
],["iOS Developer","Designs and builds applications for the iOS platform. Works with outside data sources and API’s. Fixes bugs and improves application performance. Collaborates with cross-functional teams to determine and launch new features. Should have knowledge of core web technologies (HTML5, CSS3, JavaScript). Requires a bachelor’s degree in area of specialty and 2 years of relevant experience."
]]

create_jobs = 10.times to |i|
    job_test=jobs_info[rand(0..2)]
    Job.create!([title: job_test[0],description: job_test[1], wage_upper_bound: rand(50..99)*100,wage_lower_bound: rand(10..49)*100, is_hidden:"false")
end


create_jobs = 10.times to |i|
    job_test=jobs_info[rand(0..2)]
    Job.create!([title: job_test[0],description: job_test[1], wage_upper_bound: rand(50..99)*100,wage_lower_bound: rand(10..49)*100, is_hidden:"true")
end

2、终端运行:

rake db:migrate
rake db:seed

3、重启服务器rails s

大功告成！

rake task

这里我们需要用到强大的faker。点击，看faker文档的具体说明。

此外，类似的gem还有forgery , random_data。

1、在gemfile中添加faker

gem 'faker'

终端bundle install

2、建立一个rake task，这里我们建立一个faker_jobs的任务，终端运行：

rails g task dev fake_jobs

生成lib/tasks/dev.rake文件，内容如下：

namespace :dev do
  desc "TODO"
  task fake_jobs: :environment do
  end
 end

3、根据Job的字段，添加相应的Faker数据。比如下面这样：

namespace :dev do
  desc "TODO"
  task fake_jobs: :environment do
    Job.delete_all

    Jobs = []

    100.times do |i|
      Jobs << Job.create!(:title => Faker::Job.title,
                          :description => Faker::Job.key_skill,
                          :wage_upper_bound => Faker::Number.between(10000,30000),
                          :wage_lower_bound => Faker::Number.between(1000,10000),
                          :contact_email => Faker::Internet.email,
                          :is_hidden => [true, false].sample,
                          :created_at => Time.now - rand(10).days - rand(24).hours)
  end
 end
end

4、终端运行：rake dev:fake_orders

5、重启服务器rails s , 搞定！

优劣对比

两者有何不同？哪个更优？

【以下纯属个人理解，有不正确之处，非常欢迎指正】

从个人的使用来看，根据不同的需要去使用相应的方式来生成。

seeds：种子方式的好处是，你已经将model的一个或多个字段的取值固定，这样每次生成的都是固定范围的，特别当model中含有一一对应的变量时，比如你做一个商店网站，商品和商品图片是对应的，用faker来随机生成，那简直就是灾难，只能通过seeds。但seeds不好的地方也很明显，你得先把这些字段的取值放进去，不管是通过copy的方式还是手敲（应该没人手敲吧？：P），如果仅仅是用来做测试，这样就有点废了。
rake task：优点是代码清爽，简单直接；缺点在于Faker/random_data／forgery的随机生成，会无视各个字段之间的关系。如果字段之间存在一定的关联，那你用这些gem的后果不是清爽，而是很酸爽。

相比seeds，个人其实更喜欢用rake task，毕竟这些数据只是用来测试，而且大部分字段之间并没有很直接的逻辑关系，当然也因为less code less bug （捂脸）。