Thoughts on technology and innovation
Ted Husted, Release Engineer
If you have a way to pull and format data from another system, the Apex Data Loader is a great, on-demand way to push data into Salesforce CRM. Without much effort, you can automate the process and create a hands-free data sync. The tricky part is that while the Data Loader creates excellent logs, there's no obvious way to alert a person when a data sync fails. Once a system is in place, failures are rare, but, in real life, stuff happens, and it's helpful if failsafes are in place.
A key problem is that the Apex Data Loader writes an error log whether there are errors or not. If the pass is 100% successful, the error log will have a header line and no rows.
If the error log was only written when errors occurred, we could just look for the error logs, and process the files whenever they crop up. But, in this case, we first need to look inside the log and count the lines before we can be sure there was a failure.
On a Windows-based system, one solution is to use a VB script, like COUNTLN.VBS, to count the number of lines in a file.
' Description: Count the number of lines in a file and return as error level (up to 255)
' Usage: cscript countln.vbs errors.csv
' Concept: http://blogs.technet.com/b/heyscriptingguy/archive/2004/10/12/how-can-i-count-the-number-of-lines-in-a-text-file.aspx
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile (WScript.Arguments.Item(0), ForReading)
' Wscript.Echo "Number of lines: " & objTextFile.Line
The COUNTLN.VBS script sets the exit condition, which can then be tested in a DOS Batch file, like CHECKLOG.BAT.
REM Count the lines and send the errors, if any
ECHO CountLn: %1
CSCRIPT ..\countln.vbs %1
IF ERRORLEVEL 1 ECHO Line Count: %errorlevel%
IF ERRORLEVEL 3 ..\7z.exe a %2 %1
ECHO Checklogs process complete.
The Apex Data Loader takes as input a common-separated-value (CSV) or a tab-delimited file. The first row of the input file is a header detailing the input columns, with each subsequent row representing a record to be sent to the target Salesforce CRM object.
When the Data Loader runs, it creates an success(ID).log and an error(ID).log. The (ID) is replaced with a timestamp shared by the companion success and error logs. Both logs contain a header row, and a row for each input row, detailing the data processed for each column. The success log contains all the rows that made it into Salesforce. The error logs contains all the rows that did not cross over.
Conveniently, the logs can also be used as input files. If the errors can be easily fixed, you can resubmit the error log to the Apex Data Loader, without going back to the external system.
The CHECKLOG.BAT file is designed to add the matching error files to a compressed archive file in the widely-supported ZIP format. Here we're using the open source 7z utility to create the archive, but others would work as well. In this way, if CHECKLOG is is not run as often as the DataLoader itself, the process can support a single file or multiple files. And, if the data is sensitive, most ZIP utilities support encrypting the archive.
Given a secure ZIP archive, email can be a great way to get the error logs to an administrator who can address the problems. Staying with our Windows platform theme, here's another VB script that can send an email alert, with an attachment, to a Gmail account.
'Usage: cscript sendmail.vbs <email@example.com> "" "" ""
'Ex. No attach: cscript sendmail.vbs firstname.lastname@example.org "test subject line" "test email body"
'Ex. W/ attach: cscript sendmail.vbs email@example.com "test subject line" "test email body" "c:\scripts\log.txt"
'Attachment path cannot be relative
'*** CONFIGURE THE FROM EMAIL ADDRESS AND PASSWORD ***
Const fromEmail = "(TBD)@gmail.com"
Const password = "(TBD)"
'*** END OF CONFIGURATION ***
Dim emailObj, emailConfig
Set emailObj = CreateObject("CDO.Message")
emailObj.From = fromEmail
emailObj.To = WScript.Arguments.Item(0)
emailObj.Subject = WScript.Arguments.Item(1)
emailObj.TextBody = WScript.Arguments.Item(2)
If WScript.Arguments.Count > 3 Then
Set emailConfig = emailObj.Configuration
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "smtp.gmail.com"
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 465
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 1
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = true
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/sendusername") = fromEmail
emailConfig.Fields("http://schemas.microsoft.com/cdo/configuration/sendpassword") = password
Set emailobj = nothing
Set emailConfig = nothing
Of course, the notion is that you create a special Gmail account just to receive the alerts and then forward to an administrator from there.
While we have the pieces in place for handling the error logs, there is also the problem of what to do with the success logs. The Apex Data Loader kindly generates both logs, that between them, contain all of the detail found in the original input file, for each and every pass.
While this level of detail is great (really-really-great), if you are running a sync every five minutes, retrieving a log later can be a real "cant-see-the-forest--for-the-trees" headache.
Long story short, in addition to error alerts, an automated Apex Data Loader sync also needs help with general log management. The ARCHIVE.BAT file is designed to be run at least once a day, but could also be run every hour. It sweeps all of the success logs into an archive for the current date, and, as discussed, emails the error logs to a Gmail account.
To deploy the system, beneath the current logging folder (usually "Status"), create an Archive folder, and beneath that create Success and Error folders. Put the ARCHIVE.BAT and VBS files under the Archive folder (along with the 7z utility), and the CHECKLOG.BAT under the Error folder.
A call to ARCHIVE.BAT can be queued as a Windows Scheduled Task to run every hour, or once a day, depending on how often the Apex Data Loader runs. The system moves the logs out of .\Status and creates a ZIP for each day under .\Status\Archive in the format "LOGS-YYMMDD.ZIP" (while emailing the error logs to the Gmail account). Temporary copies of the Success and Error logs are also put into the Success and Error folders, for processing by CHECKLOG.BAT.
REM Email non-empty Data Loader error logs to a GMail account and
REM archive to a zip file all other log files (*.csv).
REM Main file to call from a scheduled task.
REM Intended to be launched from a subdirectory directly beneath the log folder.
REM Expects an error\ subdirectory to exist (status\archive\error).
REM Temporarily stores error logs being emailed in errorlog.zip and
REM calls archivelogs.bat to store logs by date in LOGS-20??-??-??.ZIP
REM Uses as dependencies: checklog.bat, countln.vbs, sendmail.vbs, 7z*.*
REM Checklog.bat is stored in subdirectories .\error and .\success.
REM All others in status\archive next to this file.
REM Bring logs down to a working folder, to avoid gaps in process
IF EXIST ..\*.csv MOVE ..\*.csv .
IF NOT EXIST *.csv GOTO Done
REM Buffer errors and archive latest logs (\error must exist)
IF EXIST error*.csv COPY error*.csv error\
IF EXIST success*.csv COPY success*.csv success\
.\7z.exe a LOGS-%TDate%.zip *.csv
IF ERRORLEVEL 1 GOTO Done
IF NOT EXIST LOGS-%TDate%.zip GOTO Done
REM Cleanup errorlog.zip before calling errorlogs.bat
SET ZIP=(TBD: Full path to a local working directory)\errorlog.zip
IF EXIST %ZIP% ERASE %ZIP%
REM Call checklog for each error log.
FOR %%x IN ( error*.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%
ECHO Emailing: %ZIP%
REM Attachment path cannot be relative
CSCRIPT ..\sendmail.vbs firstname.lastname@example.org "Alert! Data Load Failed." "The Data Loader was unable to push records to Salesforce." "%ZIP%"
REM Cleanup successlog.zip before calling checklogs.bat
SET ZIP=(TBD: Full path to a local working directory)\successlog.zip
IF EXIST %ZIP% ERASE %ZIP%
REM Call checklog for each success log.
FOR %%x IN (success*.csv ) DO CMD.EXE /C checklog.bat %%x %%ZIP%%
ECHO Emailing: %ZIP%
REM Attachment path (%ZIP%) cannot be relative
CSCRIPT ..\sendmail.vbs (TBD@TBD.COM) "Alert! Data Load Succeeded." "The Data Loader was able to push records to Salesforce." "%ZIP%"
ECHO Archive process complete.
The second CHECKLOG call to the success folder is optional. When left in place, the system will email the success and error logs separately, in case the administrator wants assurance that the system is running successfully on schedule.
Another option would be to check for certain types of fatal errors. One fatal case would be someone renaming a target field in Salesforce without updating the Apex Data Loader configuration. If the mapping file refers to a field that no longer exists, the Apex Data Loader generates a zero-length success log (with not even a header row). The CHECKFATAL.BAT alternative only ZIPs for emailing any success logs that are totally empty.
REM Count the lines and send only fatal errors, if any. Only use in Success folder.
ECHO CountLn: %1
CSCRIPT ..\countln.vbs %1
IF ERRORLEVEL 2 ERASE %1
IF EXIST %1 ..\7z.exe a %2 %1
ECHO Check fatal process complete.
The final touch is to prune the logs every so often. The PRUNELOGS.BAT batch file can be run once a month, on the first of the month. Based on the date represented by the file name, PRUNELOGS.BAT retains 2-3 months of logs, based on the current month, and deletes the rest.
REM ZIP delete the archive for month before last (retain 2-3 months of logs)
REM Intended to be called on the 1st day of each month
SET MM %date:~-10,2
IF “%MM%“ == “01” ERASE LOGS-20??-11-??.ZIP
IF “%MM%“ == “02” ERASE LOGS-20??-12-??.ZIP
IF “%MM%“ == “03” ERASE LOGS-20??-01-??.ZIP
IF “%MM%“ == “04” ERASE LOGS-20??-02-??.ZIP
IF “%MM%“ == “05” ERASE LOGS-20??-03-??.ZIP
IF “%MM%“ == “06” ERASE LOGS-20??-04-??.ZIP
IF “%MM%“ == “07” ERASE LOGS-20??-05-??.ZIP
IF “%MM%“ == “08” ERASE LOGS-20??-06-??.ZIP
IF “%MM%“ == “09” ERASE LOGS-20??-07-??.ZIP
IF “%MM%“ == “10” ERASE LOGS-20??-08-??.ZIP
IF “%MM%“ == “11” ERASE LOGS-20??-09-??.ZIP
IF “%MM%“ == “12” ERASE LOGS-20??-10-??.ZIP
With these patches in place, the Apex Data Loader can be a safe and reliable way to automatically import data files generated by another system. Though, no data integration system is foolproof, and you should still devise a custom validation report that runs every day to confirm that data is flowing into the system as expected.
While it would be wonderful if we could originate and retain all of our enterprise data only in Salesforce CRM, in real life, we still need to integrate with other systems, and, when we do, tools like the DataLoader, with a pinch of salt, can help us get the job done.
For more Apex Data Loader tips, drop by the blog Using the Apex Data Loader with Salesforce.