ODSC WEST 2019 RECAP
Brief recap of the Open Data Science Conference – San Francisco, October 2019
On AI ROI: the questions you need to be asking Kerstin Frailey Metis
Success is unpredictable in AI – feasibility is often unknown before a project has begun. Projects are esoteric – require highly specialized training. Application is new – methods to track ROI haven’t been adjusted for AI; managing AI is a challenge
Performance is volatile – and there’s an iterative lifecycle. Feedback loops and response to AI intervention speed up the expiration of data and it’s dependent models
Targets are fuzzy – executives don’t have experience with AI projects so they can’t set clear expectations. Data scientists inevitably miss the intangible goals
Data science teams strive to achieve good data science; which doesn’t always translate to achieving business goals
Data scientist are not informed of the strategy of the business
AI ROI is urgent – companies continue to double down on investments even without seeing ROI on their investments. Investments will dry up
Planning – what type of ROI should we expect? Some projects are explorations for gaining knowledge, some are recommendations or informing decisions, and some are automation interventions; e.g. fraud prevention project – to interfere and not allow fraudulent transactions from going through. Identifying fraudulent transactions isn’t impactful – reducing the costs due to fraud is the desired impact. Sanity check – is this pure automation play? Could descriptive statistics do the job without AI? Does it just seem fun but promises no clear impact? Is it more cost-effective than pre-exiting or vendor-based solutions? ensure that the project is strategically positioned and if the system can be leveraged elsewhere
Development – what kind of errors do we prefer, when we compare models and presenting the models to the business, have to show we’ve thought about this. What kind of volatility can we handle? Volatility in performance is expected; ask your stakeholders what they can accept. Benchmark- simulate, use ghost mode and control group to see how performance compares. Sanity check – is the solution user friendly? An the solution be scaled to address the entirety of the problem? Are we building feedback loops? Will the data become stale quickly? Will we need to update this model often?
Deployment – is it performing close to expectations – are business metrics moving as expected in response to the data science metrics? Is variability close to expectations?
Governance – Sanity check – are we monitoring feedback loops, are we iterating to the point of overoptimizing, are we duct-taping updates and iterations?
Peter Welinder OpenAI – Learning Robot Dexterity
In order to interact with the world; we have to make contact with it. Open AI wants to combine learning and manipulation to allow robots to carry out useful tasks. They decided to test a rubrics cube – they can do this in a fraction of a second
Past – high robotics expertise
Future – all you need is learning
- Deep reinforcement learning – learning by trial and error; like teaching a dog. Drawback is that it takes a long time to get results
- Simulation to reality
- Cool results
Harry Glaser Sisense – sources of bias: strategies for tackling inherent bias in AI
AI judge developed by UCL computer scientists
AI could identify gang crimes and ignite an ethical firestorm
False positives are a societal issue here; a machine treats a specific geography as “gang related” and results in severe punishments
Another example is using facial recognition during TSA security check. This was an issue for Asian race; you have to build a team that represents the wider world that you plan to apply your model on; the broader and more diverse team you can get – the better the outcomes of the models.
AI unchallenged runs a strong risk of delivering immoral outcomes. It is your job and responsibility as a data professional to use your skills to be the moral compass of your organization and make it right.
Turning data teams into superheroes – Sisense – think about human outcomes
Google employees quit over controversial pentagon work; targeting ads vs targeting drone strikes – got to the right outcome because the engineers that developed the system took ownership and knew the differences between outcomes
How can you incorporate data ethically?
Holistic metrics – grade yourself on metrics you think match well with societal outcomes
Representative & diverse teams – key to building models with a positive outcome on society
Check your sources – get diverse sources and consider the bias in your data
Who do you report to?
Data professionals reporting directly to sales lean towards the biases of sales objectives. Centralized data teams are more likely to remove bias.
You need a Chief Data Office that thinks holistically about the ethical use of data; CDO is conscious of the organization.
You are the conscience of AI – this is your responsibility
Building AI Products: delivery vs. discovery
Companies face challenges with getting data science to work for them;
Information technology – integrate deploy and mange finished products in production
Software engineering design and code new products using best practices
Data engineering – build data pipelines that collect, organize and validate data
Data science- discover the unknown patterns in data and algorithms that add business value
Data science is different – cross functional engineering, product, marketing, finance – must work autonomous – separate from the traditional engineering product lifecycle, self-organizing and self-managing. It’s also experimental; – form a hypothesis, analyze data, make predictions, run back tests, a/b testing. It’s also self-sustaining – not a cost center, generates a revenue
Problem is companies hoard data; data stores are a cost sink.
Rapid prototyping is key for data science; back of the envelop calculations; simple experiments; don’t make plans – make tests. Repeat until it works.
Kirk Borne Adapting Machine Learning Algorithms to Novel Use Cases
It’s not about telling the business about the coolness of your algorithm – it’s about connecting it to their needs and using storytelling to do this
Innovations are inspired by data, informed by data, enabled by data, and create value from data
Confucius says “study your past to know your future” – machine learning
Travel sites raise prices for mac users because they assume they make more money and would be willing top pay more
Most important thing in your data is metadata
Hilary mason – getting specific about algorithm bias
Facial recognition products fail to do a good job with darker shades of skin; 99.7 white male and 65.3% darker female.
Sources of bias enter at different stages; machine learning can amplify bias
People are more likely to assume algorithms are objective or error free -even if they’re given the option of a human override
Algorithms are more likely to be implemented with no appeals process in place
Algorithms are often used at scale
The privileged are processed by people; the poor are processed by algorithms (Cathy O’Neil)