Back to Blogs

80% Clinical Data Buried in Physician Notes – Deep Text Analytics & Top Text Mining Use Cases

Clinical Data
Published on Sep 15, 2020

Organizations around the globe have been using machine learning and data science techniques to power interactions between humans and machines for a while now. One of the buzziest AI technologies these days is natural language processing (NLP), which spells out the process of programming a computer that is capable of understanding large sets of natural language data. Artificial intelligence harnesses the power of NLP to imitate human speech, curate sentences that are naturally flowing, giving a human touch to interactions between humans and machine. 

Text mining

How is text mining different from NLP? 

Often, people get puzzled between NLP and text mining (also known as text analytics) because both technologies/concept work around languages and analysis of words. Text mining or text analytics is an advanced analytics technique which is used to filter large amounts of data in order to extract valuable insights. Text mining algorithms have the capability to identify patterns and understand complex concepts, also helping businesses gather the latest trends from a zillion literature.  


In fact, NLP is a fundamental part of text mining. It performs grammatical and sematic structure analysis to understand the sentiments behind the words. Various techniques are used in natural language processing to understand the human speech. In brief, the technology powering virtual assistants, AI-chatbots, online translators etc is NLP. According to a 2019 report by Statista, the NLP market is expected to reach $43.9 billion by 2025 (as shown in the image below).

NLP market

Source: Statista

Below are few text mining use cases: 

NMT (Neural machine translation) 

A leading example of NMT – Google translate utilizes the power of deep learning to translate one language to another. Neural machine translation is an application of deep learning and its algorithms use huge datasets of translated sentences to educate the model and make it capable to translate any two languages. With the ongoing research in this field, various modifications of NMT are being researched upon and utilized by the organizations. Microsoft’s Bing had laid the groundwork for neural machine translation in 2016. Currently, Amazon and Google are the top providers of best machine translation tools.  

Sentiment analysis 

Fuelling brands with the right insights – sentiment analysis helps organizations understand what their customers feel about them. Sentiment analysis, also known as opinion mining and a part of social media analysis, analyses articles, blogs, news to help businesses formulate actionable strategies and make efficient decisions. The algorithm works by assigning a value to the text – positive, negative or neutral, to identify emotions behind the text such as happy, sad, annoyed, angry etc. Companies like Sentifi – a Swiss-based company, uses NLP to identify key influencers and brand advocates.  

Market intelligence 

Maintaining a competitive edge by incorporating relevant and latest industry trends is becoming increasingly difficult for businesses these days. Companies no longer just want social media monitoring – they need much more – the capability to refine endless websites, articles, blogs, social media posts in order to stay at par in the industry. Deep insights about competitors and their actions helps organizations finetune their strategies with market intelligence.


According to President and CEO of Regenstrief Institute, Peter J. Embi, MD,MS, 80% of clinical data is buried in unstructured physician notes. These texts can’t be read by an EHR (electronic health record), hence it cannot be used to make advanced decisions. As a result of the exponential growth of healthcare databases, NLP and text analytics is being utilized to transform this data into meaningful insights. This helps healthcare companies streamline operations, improve patient outcomes and manage compliance regulations. 

Recruitment and hiring  

NLP & text mining can enhance the recruitment process by accelerating candidate search by segregating relevant resumes. It can also curate gender neutral and bias-proof job descriptions. Sematic analysis is used by the NLP softwares to analyse relevant synonyms that assist HR professionals identify candidates that meet their criteria. For example, semantic categorization is used by ‘Textio app’ to tweak job descriptions in such a way that makes maximum candidates apply for the job. 

Top text mining use cases

Limitations of NLP and Text mining 

Text mining uses NLP techniques to extract new, relevant and valuable information by analysing large amount of literature to enhance decision making. The identification, extraction and structuring of data is done so that machines can process the text data. The structured text mining data is also used in data catalogs, semantic data fabrics and BI (business intelligence) dashboards. Yet, text mining and natural language processing have certain limitations: 

1. Absence of context: While many text mining models are statistics- based, there is an absence of background knowledge communication. In short, it means that the algorithm works by searching the words in a text and counting the occurrence of those words, also analysing the relevancy of the words (for example by calculating TFIDF) and their placement with the neighbouring words. This method has various limitations, one of which is machines can’t understand the context of the embedded words. 

2. Uncertainty of meaning: Due to many words having the same spellings but different intent, it can be difficult for a machine to understand the accurate meaning. For example, a machine might not be able to distinguish between the word ‘Jaguar’ if it is used in the context of an animal or a car. 

3. No standardization: The structured data from text mining is not based on any standards hence it is difficult to process it with other data streams.  

Overcome limitations of NLP & Text mining with deep text analytics  

Deep text analytics (DTA) extracts information from unstructured data by combining NLP and machine learning techniques to make algorithms capable of analysing text and understanding the prime context of the text. This advanced methodology goes beyond traditional text mining, it utilizes various technologies to automatically understand text such as – text structure analysis, entity extraction, term or phrase extraction, NLP techniques such as lemmatization or stemming, entity recognition and text classification etc.  

This approach provides relevant background knowledge making it easier to interpret words, sentences and paragraphs more accurately. The problem of misinterpretation can be avoided with deep text analytics – which a common problem with several virtual assistants, where they absorb information on a face value and can’t understand the underlying meaning. 


The applications of text mining and deep text analytics are increasing at an exponential rate. With the technology expected to become more sophisticated in the coming years, Gartner predicts that people will have more interactions with virtual assistants and chatbots than our partners/spouses by 2020.  Organizations are using text mining & deep text analytics to enhance their productivity, know their customers better and make data-driven decisions.