Boring side of NLP
80% of enterprise data is unstructured coming from emails, statements, contracts, policies, voice call records, surveys, call center notes, digital channel and so on.
There are plenty of business use cases where NLP can be leveraged on above dark data to bring efficiency in business process, enhance customer experience or to fuse it with structured data to reveal interesting patterns
Use cases can range from sentiment analysis, topic modeling, knowledge extraction, intent identification, document classification among others. In fact we have modeling techniques today that can achieve human level accuracy on some of these use cases
Said that, I am going to keep it simple and not going to talk about any of these interesting business case. Going to make things little boring here on
Why did I talk about 80% dark data in first place?
Most enterprise unfortunately do not know what is hidden within this pile of historical unstructured data. More specifically industries like finance, healthcare and life science which are highly regulated need to ensure there is no PII (Personally Identifiable Information) information on these data pile. Can NLP technique solve this problem of identifying PII information in data at rest that can be redacted later on?
Yes, for sure, as simple as Named Entity Extraction can be powerful and effective to solve this problem. Let me quickly illustrate it with an example of email that can be classified using NER
Hi, This is <pii_name>John</pii_name> and my ssn is <pii_ssn>111-11-1111</ pii_ssn >. I lost my credit card with number <pii_card_nbr>4444-4444-4444-4444</pii_card_nbr>. Can you please send me replacement card to address <pii_address>123 broadway, NY 10154</pii_address>. Reach out to me @<pii_phone>201-201-2011</pii_phone> in case if you need any clarification
There is more advanced NLP technique that can increase accuracy of identification and tagging in this scenario. For now, that’s it.
By: Srivatsan Srinivasan