RailsCarma
Posted on September 7, 2020
The ruby on rails Application to scrape the link uploaded from CSV file and
find the occurance of link in particular page.
In the application user need to pass a csv and list of users email to whom the parsed CSV will be sent.
In the csv there will be three 2 column:
• refferal_link
• home_link
• and there values like below
First of all we will create the rails application
$ rails new scrape_data
$ cd scrape_data
Then we will genrate the UploadCsv module, run the below command
$ rails g scaffold UploadCsv generated_csv:string csv_file:string
That will create All the required model, controller and migrations for csv_file
Then we will start by first upload the file in DB
replace the below code in files app/views/upload_csvs/_form.html.erb
we added the below code to upload file in view
<%= form_with(model: upload_csv, local: true) do |form| %>
<% if upload_csv.errors.any? %>
<%= pluralize(upload_csv.errors.count, "error") %> prohibited this upload_csv from being saved:
<ul>
<% upload_csv.errors.full_messages.each do |message| %>
<li><%= message %></li>
<% end %>
</ul>
</div>
<% end %>
<%= form.label :csv_file %>
<%= form.file_field :csv_file %>
<%= form.submit %>
<% end %>
Then we will add the gem for upload a csv_file
add the below line in gem file
gem 'carrierwave', '~> 2.0'
$ bundle install
Then we will create the uploader in carrierwave
$ rails generate uploader Avatar
we will attach the uploader in model
app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
end
before moving further just check your application is working
run below commands
$ rake db:create db:migrate
update the routes
Rails.application.routes.draw do
resources :upload_csvs
root 'upload_csvs#index'
end
$ rails s
Then we will create a Job to read the CSV file and scrape the link from it
and genrated file will be save in generated_csv column of that records
for genearting the job we will do like below
$ rails generate job genrate_csv
add the below gem and run bundle install
gem 'httparty'
gem 'nokogiri'
then we will replace the code with below
class GenrateCsvJob < ApplicationJob
queue_as :default
def perform(upload_csv)
processed_csv(upload_csv)
file = Tempfile.open(["#{Rails.root}/public/generated_csv", '.csv']) do |csv|
csv << %w[referal_link home_link count]
@new_array.each do |new_array|
csv << new_array
end
file = "#{Rails.root}/public/product_data.csv"
headers = ['referal_link', 'home_link', 'count']
file = CSV.open(file, 'w', write_headers: true, headers: headers) do |writer|
@new_array.each do |new_array|
writer << new_array
end
upload_csv.update(generated_csv: file)
end
end
NotificationMailer.send_csv(upload_csv).deliver_now! if @new_array.present?
#need to genrate the mailer and follow the mailer steps
end
# Method to get the link count and stores in the array
def processed_csv(upload_csv)
@new_array = []
CSV.foreach(upload_csv.csv_file.path, headers: true, header_converters: :symbol) do |row|
row_map = row.to_h
page = HTTParty.get(row_map[:refferal_link])
page_parse = Nokogiri::HTML(page)
link_array = page_parse.css('a').map { |link| link['href'] }
link_array_group = link_array.group_by(&:itself).map { |k, v| [k, v.length] }.to_h
@new_array.push([row_map[:refferal_link], row_map[:home_link], (link_array_group[row_map[:home_link]]).to_s])
end
end
end
Then we will attach the job after_create of upload_csvs and we will add the validation for csv_file require
please update the code of app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
after_create :processed_csv
def processed_csv
GenrateCsvJob.perform_later(self)
end
end
then check after uploding file your scrape genrated file will be updated you can check generated csv
inside /scrape_data/public/product_data.csv
we can send through email by using below instruction
First of we will genrate the mailer
$ rails generate mailer NotificationMailer
update the code of app/mailers/notification_mailer.rb
def send_csv(upload_csv)
@greeting = 'Hi'
attachments['parsed.csv'] = File.read(upload_csv.generated_csv)
mail(to: "sample@gmail.com", subject: 'CSV is parsed succesfully.')
end
end
please configure the mail configure also config/environments/development.rb or production.rb
add below lines in the file
config.action_mailer.default_url_options = { host: 'https://sample-scrape.herokuapp.com/' }
config.action_mailer.delivery_method = :smtp
config.action_mailer.smtp_settings = {
user_name: 'sample@gmail.com',
password: '*******123456',
domain: 'gmail.com',
address: 'smtp.gmail.com',
port: '587',
authentication: :plain
}
config.action_mailer.raise_delivery_errors = false
and update the view also app/views/notification_mailer/send_csv.html.erb
CSV has been processed, Thanks!
, Please check attachment to recieve the email
Thanks!
Posted on September 7, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.