Boring side of NLP

Boring side of NLP

February 7, 2019 DATAcated Challenge 0

80% of enterprise data is unstructured coming from emails, statements, contracts, policies, voice call records, surveys, call center notes, digital channel and so on.

There are plenty of business use cases where NLP can be leveraged on above dark data to bring efficiency in business process, enhance customer experience or to fuse it with structured data to reveal interesting patterns

Use cases can range from sentiment analysis, topic modeling, knowledge extraction, intent identification, document classification among others. In fact we have modeling techniques today that can achieve human level accuracy on some of these use cases

Said that, I am going to keep it simple and not going to talk about any of these interesting business case. Going to make things little boring here on

Why did I talk about 80% dark data in first place?

Most enterprise unfortunately do not know what is hidden within this pile of historical unstructured data. More specifically industries like finance, healthcare and life science which are highly regulated need to ensure there is no PII (Personally Identifiable Information) information on these data pile. Can NLP technique solve this problem of identifying PII information in data at rest that can be redacted later on?

Yes, for sure, as simple as Named Entity Extraction can be powerful and effective to solve this problem. Let me quickly illustrate it with an example of email that can be classified using NER

Hi, This is <pii_name>John</pii_name> and my ssn is <pii_ssn>111-11-1111</ pii_ssn >. I lost my credit card with number <pii_card_nbr>4444-4444-4444-4444</pii_card_nbr>. Can you please send me replacement card to address <pii_address>123 broadway, NY 10154</pii_address>. Reach out to me @<pii_phone>201-201-2011</pii_phone> in case if you need any clarification

Thanks

<pii_name>John Doe</pii_name>

<pii_company>XYZ Enterprise</pii_company>

There is more advanced NLP technique that can increase accuracy of identification and tagging in this scenario. For now, that’s it.

By: Srivatsan Srinivasan

 

Leave a Reply

Your email address will not be published. Required fields are marked *