top of page

AI Based Critical Field Extraction from Medical Reports

Medical reports are an extremely important part of any medical diagnosis. These reports contain detailed information about every ailment and test performed; they provide a comprehensive overview of patients’ history, including the treatment currently being used to help them recover. These reports are also presented as a total compilation of every procedure undertaken by a patient at a particular doctor’s office. These compiled reports are used for peer-reviewing and medical cross-referencing—which can lead to important breakthroughs in the field.


But the most common problem with these reports is that, due to their technical nature and sheer volume; each patient can generate dozens of them each year; they are extremely difficult for doctors to understand. This makes monitoring hard work.

New Delhi, India


Healthcare Technology Solution Provider



Embebo is an Healthcare services company from India that specializes in healthcare solutions. Embebo provides hospitals, clinics, and healthcare units with modules & tools for organizing medical notes, records, and opinions with seamless integration, efficient monitoring, and enhanced insights.

The task was to create a model using state-of-the art artificial intelligence technology to extract and consolidate data from the medical reports produced by doctors and labs in India. 



Our extraction module needs to address these challenges beforehand:

  • Detecting Critical Fields & Information - It is critical to identify both the key areas of a report that deserve the most attention and the specific information within those areas that is most important. This process can be extremely challenging, as the relevant fields and data are often extensive and varied, it's like finding a needle in a haystack.

  • Maintaining the coherence of the reports - When extracting various fields of interest and information from a document, it is important to ensure that the extracted content does not differ from the original content, keeping the scaling standards linked with the fields.

  • Simple structuring - The extracted data should be structured in such a way that it is very user-friendly and easy to review and consumed in further patient monitoring use cases.



After preliminary research & experimentation we identified that:

  • The solution to these ever-present errors is to utilise an AI-based OCR + NLP tool that can be trained to differentiate uncommon, common, and aberrant critical fields with the framework of medical reports.

  • The critical information that is being presented tends to focus on chronic, fatal illnesses and epidemics since these need the utmost care.

  • As a development step we utilised positional extraction and table detection technique, followed by NLP data validation step, as most of the report critical fields are present within table structure. 

  • As a start, table detection and OCR are implemented in parallel, the OCR raw responses (text and bounding boxes) within the detected table area, were passed to a positional extraction flow, which was implemented using regex configurations and lexicon match.  

  • If no critical field detected within table bounding box, then repeat the positional extraction flow for entire raw text (after header footer noise removal pre-processing). 

  • Once the critical fields are extracted, pass it to a NLP Language model (trained on report tokens), to verify if the the report entity are valid or not. If the NLP verification scores confidence lower than the fixed threshold then reverify against a report lookup (containing all possible collected report keywords and key-phrases) .

  • Final verified extracted critical fields along with meta details are store in corresponding report table.

  • This method also helps us in facilitating medical monitoring since it uses the same parameters as the critical fields and information. This allows us not only to maintain a repository of medical information but also an informational cornucopia which can help us detect disease long before it becomes adventitious.



  • A tool like this would be of great help to medical professionals since it is simplifies a grave problem - the handling of medical reports by extracting areas of interest and making a structured database out of them.

  • As mentioned earlier, this method of consolidating data can also promote the early detection of various disease.

  • This not only revolutionary in the perspective of the medical industry but also could function as a data-driven swiss-army knife for the doctors as well.

Ready to put AI to work for your business?

Make a plan and understand your ROI before you start implementing AI. 
Don’t fall into the trap most companies fall into. 
Take the first step—Get in touch today.

bottom of page