Automated Distributed Document Production

Mark Hayes, 12th Feb 2020.

Fast, large scale document production can easily be achieved by simply spreading out processing across as many machines as are available. The set-up is simple, doesn't require much more than a database, some code and networked computers.

Requirements

  • Database for handling workload with tables for:
    • Batch Header: Start Date/Time, End Date/Time, Status.
    • Batch Details: Document ID, Host ID, Status, File Name.
    • Participants: User ID, Machine ID.
  • Network storage available to all participants.
  • Procedures to create a batch, carve up the workload add participants an start the batch.
  • Document template stored on the network.
Distributed Processing

Important: The requirement in this case was not just to produce the calculated results. For various reasons, fully functional Excel models had to be created for each claim and sent, with each batch, to the client.

If only results were required, calculator document production would not have been necessary.

Process Summary

  1. Host creates a batch header.
  2. Participants connect and are logged against the batch.
  3. Host starts batch once happy all machines have connected.
  4. Server procedure splits and allocates work to participants.
  5. Participant machines:
    • Copy down the self populating Excel template.
    • Create local copies of the template and populate each with a case reference.
    • Opens up the first ten spreadsheets, each automatically connect to the backend database grab the data as per the case reference, run the population process, save and return the results to the server.
    • Application monitors the number of files being processed, as soon as the number drops below ten, spawns another until the process is complete.
    • Once the host workload is complete, all files are copied back to the network and machine exits the batch.
  6. Once all machines have finished, the host machine closes the batch and generates batch report listing any failures etc., and produces a random sample set for quality control.

The Benefits

In this particular example, a process generating over five thousand complex MS Excel spreadsheets per batch, calculating PPI refunds with model results being written back to the database, processing was reduced from around ten hours to minutes as per the number of available machines. Due to regulatory deadlines, each batch had to be completed by a specific date to avoid large fines.

Timing wasn't the only benefit brought by distributing the processing and having the spreadsheets self-process. The end-to-end process and calculator template changed frequently. Being able to run multiple test batches in a day made testing and debugging a breeze.

In the original process one machine generated one spreadsheet at a time, populating it and writing back the results to the server. In the new process many machines each generated ten spreadsheets at a time in parallel. VBA in the spreadsheets handled grabbing the data from the server and writing back the results.

One Small Problem...

The spreadsheets could not be sent back to the client containing the VBA that was used to build them, IPR needed protecting. To resolve this I structured the VBA as follows:

  • The code that handled the population and processing was attached to a blank worksheet in the workbook.
  • The auto-open code was held in the workbook module and had an "On Error Resume Next" just before the call to the processing function held in the worksheet.
  • Once execution returned to the workbook module, the worksheet containing the process code was deleted.
  • When the workbook is opened at a later date, the "On Error Resume Next" allows it to do so without issue.

Conclusion

The process here describes a particular scenario with very specific requirements. It can however be modified to suit whatever the requirements.

In this particular case there was no dedicated hardware, or a software product, the process was live almost from the moment development started and all we had to work with was desktop PCs and MS Office.

The point is, if you need to produce thousands of unique, template based, documents quickly, have multiple computers that can be utilised for the time it takes, you can do so without having to invest in additional hardware. For example, a process that might take 3 hours will be reduced to 30 minutes with 6 computers involved.

The Bottom Line

Thousands of complex Excel workbooks can be produced, logged and packaged in just a few minutes by sharing the process across multiple machines.

In this case a process that originally took many hours could be completed in just a few minutes. More importantly, it could be repeated many times very quickly when testing new spreadsheet templates.

To get in touch please click here...