Technology advancements enable inclusion of machine learning in mobile applications. Key use cases for machine learning in healthcare mobile apps are drug discovery, diagnosis in medical imaging, detecting cancerous tumors on mammograms, etc. A relevant use case involves members taking a picture of their card using their mobile. iOS Vision framework and Google’s Tensorflow can then be used to extract key information like member id, name of the Insurance carrier and the plan, from the image of the insurance card.
The diagram above illustrates the key components of a machine learning solution for data extraction, using mobile devices. The three major components in the flow include:
- Logo detector (ML model)
- Text Recognizer
- Character Recognizer
- The Logo Detector is responsible for identifying the user’s insurance provider using the logo placed on the insurance card. This component leverages Tensorflow machine learning model which is based on a transfer learning. It uses a pre-trained inception v3 rather than creating a model from scratch, thus saving it is time-saving.
- Inception v3 is comprises of multiple layers stacked over each other. These layers are pre-trained and useful in finding and summarizing the information for classification of images. The last layer of this pre-trained model, is trained with to one’s own dataset to create a distinct model.
Training the logo detector model
- Inception v3 image classifier model for the neural network takes in an image scaled to a 299×299 pixel resolution to identify the company logo and outputs the insurance provider name.
- A key challenge in logo detector model was to train the Inception v3 model. To get additional training data and improve generalization on the insurance provider name detection, one should undertake data augmentation by transforming the images with small rotations, 180 degree flips as well as small changes to the color channels.
- The Text Recognizer is responsible for fetching user information from the card after the insurance provider is identified by the logo detector model. Vision Framework by Apple is recommended to create bounding boxed around the text on the insurance card image.
- The Vision framework was recently launched by Apple. It applies high-performance image analysis and computer vision techniques to identify faces, detect features, and classify scenes in images and video.
- The Character recognizer is the last stage where one can extract the actual text from image. Tesseract OCR is a good option to use.
- The Tesseract is an open source character recognition engine released under Apache license.
Running a Machine Learning Model on Mobile Devices
- App footprint: Model size is the limiting factor when it comes to using the Machine Learning model offline on mobile device, so any pre-processing that can be done to reduce the app footprint is worth considering.
- Machine learning frameworks are available for running models locally on mobile devices. These frameworks also do model optimization that helps improve performance of the model on mobile.
- Size of model: Machine learning models have large size because of weights, which are large block of floating point values. Hence compression does not provide significant effect on the size.
Quantize script helps reduce the image size by reducing the number of colors. Applying an almost identical process to neural network weights, has a similar effect which provides 70% reduction in size on compression without any changes to the structure of the network. The optimized model is then integrated in the mobile application without much increase in the app's footprint.
The accessibility to machine learning libraries opens the door to implementing innovative use cases that not only improves the efficiency with mundane tasks but also optimize workflows.