Insights from 7 Sessions of the DATAx Conference in NYC (Dec 2018)
The DATAx conference in NY (Innovation Enterprise Summits) took place on December 12th and 13th of 2018. The event was run as five parallel track sessions that focused on different areas and allowed participants to flow from one track to another. Though I couldn’t attend every session here is a glimpse into some of the awesome presentations that I listened to.
Making Data Useful
Cassie Kozyrkov, Chief Decision Scientist at Google
Cassie Kozyrkov provided actionable advice around making data scientists more useful in making a business impact. A few key points from the talk include:
It is important to incentivize your entire workforce to look at information; make it easy to access and break the silos.
Rigor should begin with the decision maker. She discussed how academia teaches the skill of using the most complex tool in the shed which becomes a hard habit to break after years of practice, so when these individuals are hired into commercial businesses, they don’t magically change their mindset (and they don’t yet know what’s important in terms of business value). This means they may invest rigor in the wrong projects. She recommends making a broad-shallow sweep of the data first and only digging deeper if decision-makers identify the pursuit as valuable.
Do things in the right order. (You can read about this more in her post: “The first step in AI might surprise you”)
The right way to approach an applied project is to flip the algorithms-inputs-outputs order on its head, like so: think about outputs, then inputs, then algorithms/models. Make sure you have the right leadership involvement at the right stage of each project. If they are separate individuals (in many organizations, a single human owns several or all of these roles), then you would pass the baton as follows:
- Decision maker (focus on outputs)
- Lead analyst (focus on inputs)
- Lead machine learning engineer (focus on algorithms/models)
- Lead statistician (focus on performance)
Another key message that was heard throughout her talk was “split your damn data” – which Cassie calls the best idea to live by, so she hopes they’ll be the words her tombstone 😊
Customized Regression Model for Airbnb Dynamic Pricing
Hangjun Xu, Senior Machine Learning Software Engineer, Airbnb
Hangjun described the pricing strategy model deployed at Airbnb, an online marketplace for sharing home and experience. The goal of price optimization is to help hosts who share their homes on Airbnb set the optimal price for their listings.
In contrast to conventional pricing problems, where pricing strategies are applied to a large quantity of identical products, there are no “identical” products on Airbnb, because each listing on their platform offers unique values and experiences to their guests. The unique nature of Airbnb listings makes it very difficult to estimate an accurate demand curve that’s required to apply conventional revenue maximization pricing strategies.
They use a supervised regression model, with optimal price as the label. If a listing doesn’t get booked at $140, they shouldn’t list it at a higher price (say $160). On the other hand, if the listing was booked at $120 then they shouldn’t list it at a price below that (say $110). This creates a booking range with a minima and maxima that they try to balance.
Some challenges faced by the company includes dealing with very unique listings and sparse data.
Data-Driven Product Management: How to Build a Winning Customer Experience in the Amazon Era
Rebecca Greene, Chief Product Officer, Handy
In today’s retail environment, two-day shipping, single-click ordering, and easy returns have become table stakes. To stay competitive in the Amazon-era, a company must adapt to meet and exceed expectations for an immediate, seamless, and end-to-end customer experience.
Rebecca spoke about how companies can meet growing customer expectations through data-driven product management. She also shared some of the design decisions and customer considerations that go into building a successful product in today’s business environment.
The main take away from her talk for me was that “convenience is king” – making things easier for your buyers will make things easier for you! A few examples she provided on companies making customers’ lives easier were: Lowes with their virtual reality that allows you to navigate where things are located in the store more quickly Another example was Sephora with their surveys, color matches, etc. Of course, Amazon raised the stakes with their free returns and 2-day shipping.
One way that her company, Handy, makes life easier for customers is by seamlessly integrating its services with other companies (such as Wayfair – that sell furniture). Handy provides home services (one of which is putting together furniture). By working together, they streamline the process and provide the furniture + the assembly work all on one screen (without a separate check out) and without extra scheduling since Handy just comes over on the same day as the delivery truck comes by.
They also use plenty of A/B testing to see what works well for their clients. For example, they tried advertising how Handy works by using a 2 min video and by using an FAQ text section to uncover that in that purchasing moment, the buyer preferred to read the FAQ vs. watching a video.
Lightning Introduction to Deep Learning
Alfred Essa, Vice President, Analytics and R&D, McGraw-Hill Education
Deep learning is the fastest growing area in machine learning. In his presentation, Alfred covered the questions of what is deep learning? Why is it important? How does it work in comparison to other ML techniques? He then concluded by building a simple deep learning model in marketing.
Alfred starting with a discussion about what it means to be “extreme”, he provided examples such as running a marathon (which I could totally relate to 😊) – he then went on to talk about how deep learning is an extreme case of machine learning.
We went over the various aspects of deep learning, including: neurons, layers, forward propagation, loss & cost functions, and back propagation. The end goal being to create a set of network parameters (weights, and biases) that achieve the lowest loss or error.
10 Lessons Learned from AI Initiatives in the Financial Services Sector
Andy Price, Financial Services Lead, Pure Storage
Data-driven use cases are paving the way for next-generation work streams like artificial intelligence (AI) across the business landscape, and the Financial Services industry is no exception.
To deliver AI at scale, organizations must consider several major dependencies and challenges, which require them to have a high-level understanding of the technical requirements that an AI project will place on the infrastructure within their organization. Andy shared the 10 lessons learned from AI initiatives in the financial services sector.
- Don’t underestimate the information security challenge – getting access to the internal data you need is difficult, other challenges include putting data into the public cloud. Plans for data sourcing take 2-3 times longer than anyone predicts
- Tooling and frameworks are evolving quickly – don’t get religious about tools. You don’t want to be right for 6 months and wrong forever. The principle to follow here is to build upon open standards and avoid vendor lock in – so you can swap in/ out tools as needed
- Is your initiative strategic or tactical – discuss upfront with customers and stakeholder’s as it may drive a different thought process. Think about the implications of having to conduct a complete re-design of solutions. Technology investments might be thrown away if you don’t think about strategic direction upfront. If it is strategic, treat it as such and build on foundations that allow you to scale without disruption
- AI is often challenging to POC – test the hard stuff- try to break it – don’t waste time on the easy stuff. Conduct a real-world test when possible; to understand how far your investments can take you (avoid surprises down the line). Toy data sets yield toy results
- Document your critical capabilities – what is the realistic level of scale that you want to architect for the solution; how many GPUs, etc. you might want to reinvest in the future but it’s worthwhile to document this upfront
- Infrastructure key priorities – keep the GPUs busy, keep the data scientists busy doing data science. Avoid having data scientists doing systems integrations and system optimization/ tuning. Ensure that the infrastructure has the ability to handle the chaos factor; tuned for everything and able to come with different data types concurrently
- Recognize the cost of data science – you don’t want data scientists unproductive. Average cost of a data scientist in NYC is $150k per year. Don’t skimp money on infrastructure investment. If we buy into software eating the world, data being the world’s most valuable resource and AI being the 4th industrial revolution then let’s act and invest accordingly.
- AI is a pipeline – not just training. Between 60-80% of AI work is data preparation and only 20-40% training. Most organizations are focused on training portions; but all the earlier stages included collecting various data sets from data scientist’ laptops across various areas in the organization. Need to consolidate, be efficient and effective across the end to end platform
- Few people understand the end-to-end solution. Data scientists understand the tools and frameworks. Infrastructure has to be up to the challenge. Very few data scientists understand the infrastructure, and few infrastructure people don’t understand data science; need to bridge that gap
- In big organizations this becomes increasingly more challenging – need close collaboration; and companies work in silos. AIRI AI Ready Infrastructure – product Flash Blade – once your GPU numbers go up traditional storage can’t keep up
5 Lessons Learned from Teaching Machine Learning (ML) in Finance
Meninder (Mike) Purewal, Director, Data Scientist & Adjunct Professor, New York University
Mike spoke about the five lessons he learned from teaching ML in Finance.
- Teaching ML in finance is hard – finding the unicorn that understands finance, math, and computer science is close to impossible. The best bet is to find people that cover a few of the areas and team them up with people that cover other areas of expertise. Education supply is just beginning to meet demand
- Teaching ML in finance is easy – with the ease of Google search, you can quickly find code and copy paste it into your console and say that you are a data scientist. There is some appeal and a lot of danger in this because most don’t have a full understanding of how the stuff works
- Expectations vs. reality – in reality the work is hard and the details are a grind. Active products in the market are mainly coded rules vs. AI. It is a grind to make these things work. Expectation of senior leaders don’t match the reality
- Not many great examples to teach with – finance culture tends to be secretive (information hoarding). There’s a dearth of textbooks so he recommends people use blogs/ LinkedIn posts to learn from. He is forced to use the Iris data set because finance data is scarce. This results in people not being able to learn until they are actually on the job
- Deep & few vs. shallow & many – there are two directions people can take in machine learning. They can become specialists that focus on research (deep & few), or go into consulting/ managerial roles (shallow & man). There is value in both but they go down different paths. You need to advertise yourself appropriately and set learning goals and expectations appropriately
Accelerating Innovation through Responsible AI
Jay Chakraborty, Director & Adjunct Professor, PwC
I was really looking forward to this session because “ethics” in the context of AI & data science have really been top of mind for me lately. Jay did a great job at highlighting some examples of why we need the human element when it comes to working with machines/ data. The examples he provided included Stanislav Petrov – who recognized that there was an issue in the data that was showing that Russia was under attack (he realized that the US wouldn’t send only 5 missiles to attack Russia; since that wouldn’t be enough) – he basically uncovered a system glitch; read more about that here.
Another example was the predictions of beauty pageant winners (which contained bias towards selecting mainly white females as potential winners). Additionally, he touched on an example that I plan to include in the data science ethics course that I’m developing- this was around an algorithm that decides which neighborhoods police should be patrolling – which usually results in them patrolling poorer neighborhoods and finding more “crime” there by the mere chance of being in that neighborhood.
In order to unleash the full potential of AI to transform your business, there needs to be a deep understanding of the customer, embedding societal and ethical implications in AI design and a structured approach for model development. He introduced the GREAT model for responsible AI; which stands for Governed, Reliable, Ethical, Accountable, & Transparent.
Jay also highlighted the fact that we need to move away from using the “black box” excuse and start opening it up and documenting everything that goes into these models/ algorithms.
In summary, the DATAx conference has provided me several learning opportunities and enabled me to meet some really interesting people!