Personal tools
You are here: Home / OSCAR EMR 12.x / 4.0 Developers / 4.6 Integrating Hospital Reports / 4.6.1 Generic Hospital Integration

4.6.1 Generic Hospital Integration

A generic method to have hospital reports automatically downloaded and integrated into Oscar labs


Any production use of OSCAR will quickly run into scanning logjams.  There are too many pieces of paper in the world and you want them in OSCAR automatically matched to patient.  The following provides a generic method of importing hospital reports into Oscar.

Document Version History

  • v1.0 – initial public release on – Oct 14, 2009
  • v1.1 – Revised and ported to July 4, 2010
This document is copyright by Peter Hutten-Czapski 2010 © under the Creative Commons Attribution-Share Alike 3.0 Unported License



  1. Preface
    1. Document Version History
  2. Prerequisites
  3. How to connect to Any Hospital System



It is assumed that you or the implementation person(s) will require
  1. some knowledge of Mirth (fairly easy)
  2. a good knowledge of javascript and regular expressions to parse the input
  3. a thorough understanding of HL7 messaging
  4. some understanding of how to set up the mule transport that OSCAR uses to suck up HL7 (fairly easy)
  5. significant time to tune to your particular output formats
  6. Mirth installed at the hospital and clinic ends
  7. Either a plain text output from the hospital system or pdfcreator and text-mining-tool installed at the hospital


This is how to transfer any hospital system report that has enough written on it to identify the patient and the doctor to import to OSCAR via its Lab function, or any other EMR that handles HL7.  The  how to is written assuming the hospital system is on Windows(R),  however some of these steps may be even easier if it is Linux or Unix.

all software is open source unless identified as not

Step 1. Get a file of output

If you can get a text output go to step 4. If no automated way works, every hospital system at least allows printing. Have the hospital set up a pdf printer : dead easy on a Linux system and for windows it is not hard : pdfcreator run as a service

we use PDFcreator currently 0.9.7  set to automatically time stamp the file and deliver into C:\HL7

Step 2. Print hospital reports to pdf as well as the regular paper jobbie (for now)

How to do this will vary by hospital system

Step 3. Strip out the text out of the pdf

Again easy out of the box for Linux, for windows I use

currently 1.1.42 its free but closed source

Step 4. run a cron job "task" to run your script

the following script will stuff the text into a HL7 "wrapper" after first removing the line feeds (line feeds are not tolerated in HL7 delimited by | )

in windows Taskmanager runs C:\HL7\pdf2txt2hl7.bat

hourly which in turn

  1. Copies all pdfs at c:\HL7\ to c:\HL7\temp for processing
  2. Uses Text Miner to extract the ASCII from the pdf
  3. Uses TR to tokenize carriage returns and clean up some characters.  Use it natively in Linux or the port of the unix tool in windows
  4. Prepends and appends start.bnk and end.bnk - a nominal HL7 wrapper for the text
  5. Moves the pdf's to processed and the newly created HL7 to C:\HL7

Start.bnk is just a HL7 fragment


 as is end.bnk, that sandwich the payload that presto becomes a OBX segment


The following is pdf2txt2hl7.bat

ECHO. PDF2TXT2HL7.BAT 2009 (C) Peter Hutten-Czapski
IF NOT (%1)==(/?) GOTO :START
ECHO.::  This batch file extracts text from hospital pdf     ::
ECHO.::  and preprocesses it removing illegal characters     ::
ECHO.::  and carriage return line feeds <CR><LF>             ::
ECHO.::  And then wraps it in a HL7 ish wrapper for Mirth    ::
ECHO. usage
ECHO.      /? gives this text



REM temporarily add path to GNUwin32 toolset and Text_Mining
PATH=%PATH%;"C:\Program Files (x86)\GnuWin32\bin";C:\Text_Mining;

COPY *.pdf C:\HL7\Testing\
MOVE *.pdf C:\HL7\temp\
CD C:\HL7\temp

for %%I in (*.pdf) do minetext "%%I" "%%~nI.tx1

REM remove ’ from the text with tr first by stripping the first
 for %%I in (*.tx1) do tr -d '\342\200' < "%%I" > "%%~nI.tx2

REM remove ’ from the text with tr second converting
 for %%I in (*.tx2) do tr '\231' '\047' < "%%I" > "%%~nI.tx3

REM remove <CR><LF> and <LF> from the text with tr
 for %%I in (*.tx3) do tr '[\r\n]' '[%%%%]' < "%%I" > "%%~nI.tx4"

REM copy pdf to processed
MOVE *.pdf C:\HL7\processed\

REM append the prototype files to form a validish HL7 with placeholders
for %%I in (*.tx4) do copy start.bnk /B + "%%I" /B + end.bnk /B "%%~nI.HL7"

MOVE *.HL7 C:\HL7\

REM Test for exact ERRORLEVEL 0
ECHO. *********************************
ECHO. ***Failure***Failure***Failure***
ECHO. *********************************

REM insert delay of 1 hour
ECHO. Control C to quit
ping -n 3500 > NUL


REM And if good then can cleanup files
del *.tx?
del *.pdf


Step 5. Install Mirth (open source multiplatform) at the hospital end.

Obviously the HL7 segments are not correctly populated, but the information needed to populate those segments is within the payload you sandwiched in the OBX segment. Mirth can extract such data and populate the segments to match whatever flavor of HL7 you like, including OSCAR "CML" standard input.

Mirth, once they realize what it is, becomes very interesting for the hospital IT people as they will find all sorts of uses for it as they struggle with HL7 transport for their own internal purposes. I use Mirth currently with Sun Java 1.6.0_11


it transforms the HL7 by using javascript RegEx you supply to strip out data from the text to populate the HL7 fields and routes the fully formed HL7 messages as appropriate to

a) me by email for blank pdf's and unassigned reports (DR.DOCTOR)

c) the Haileybury clinic by unencrypted low level protocol via SSH tunnel to another Mirth instance

The javascript invoked by Mirth on both ends (hospital and clinic) are saved as XML configurations, even more obtuse to list plain text, and sensitive by exposing local settings, identifiers, ports and protocols, but I can forward parts to authorised parties

Document Actions