Loading…
PAPIs 2018 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, October 15
 

9:00am

Deep Learning Kickstart with Keras — Training Workshop
This training workshop will take place before the main conference. It will be given in a classroom of up to 20 persons only, to maximize interaction and so you can ask even more questions than in a conference setting.

IMPORTANT:
  • A specific ticket is required to get access to the workshop ("Training (10/15) + Whole Conference (10/16-17)").
  • The venue is different from that of the main conference. The workshop will be held at Nanigans — many thanks to them for providing the space!
  • Students should bring their own laptops, for practical work. They will be given access to GPU-equipped machines in the cloud, for hands-on experiments with deep learning.


LEARNING OBJECTIVES
  • Understand the possibilities and limitations of Deep Learning
  • Understand how single and multi-layered Neural Networks are trained on data
  • Create, evaluate and optimize Neural Networks with Keras
  • Tackle image recognition tasks with Convolutional Neural Networks
  • Leverage Transfer Learning to speed up training and increase accuracy


PROGRAM
  • Introduction to Machine/Deep Learning and its possibilities:
    • Fundamental concepts
    • Formalizing supervised learning problems: classification and regression
    • Example use cases
    • Revisions of Python basics; usage of Jupyter notebooks
  • Linear and logistic regression:
    • Performance metrics: MSE (regression), accuracy and log-loss (classification)
    • Creating a single-layer network with Keras: defining input and output layers, optimizer, compilation, training
    • Logistic and softmax functions for classification
    • Data preparation
  • Multi-layered neural networks:
    • Structure of fully-connected, multi-layered networks
    • Activation functions
    • Adding layers in Keras
    • Exporting trained networks/models for deployment
  • Evaluating, optimizing and comparing models:
    • Evaluation procedure
    • Plotting and interpreting learning curves
    • Detecting overfitting
    • Reducing training time via efficient GPU utilization
    • Application to structured datasets
  • Convolutional Neural Networks and their application to image recognition:
    • Convolution layers, pooling layers, and “dropout” regularization
    • Application to MNIST (handwritten digit recognition)
  • Introduction to Transfer Learning:
    • Reusing trained deep nets to extract high-level features and tackle new problems efficiently
    • Application to an image classification challenge on Kaggle
  • Going further with Deep Learning:
    • Recap
    • Limitations of Deep Learning
    • Practical tips for using Deep Learning in your applications
    • Other types of Neural Networks
    • Resources


STUDENT REQUIREMENTS
  • Programming experience and basic knowledge of the Python syntax. Code will be provided for students to replicate what will be shown during hands-on demos. Please consult Codeacademy's Learn Python and Robert Johansson's Introduction to Python programming (in particular the following sections: Python program files, Modules, Assignment, Fundamental types, Control Flow and Functions) to learn or revise Python's basics.
  • Basic maths knowledge (undergraduate level) will be useful to better understand some of the theory behind learning algorithms, but it isn’t a hard requirement.
  • Own laptop to bring for hands-on practical work.

Speakers
avatar for Louis Dorard

Louis Dorard

General Chair, PAPIs



Monday October 15, 2018 9:00am - 5:30pm
Nanigans 100 Summer Street, 31st Floor, Boston MA, 02110
 
Tuesday, October 16
 

8:15am

Welcome & Registration
Doors open at 8.15am. We strongly recommend to arrive early in order to have time to go through security, get your PAPIs badge, choose your favorite t-shirt (we have some really cool and exclusive designs, but limited sizes), meet fellow attendees, exhibitors, and find a good seat in the auditorium. We’ll also have breakfast snacks, fruit, tea, coffee and refreshments!

Tuesday October 16, 2018 8:15am - 9:00am
Expo area

9:00am

Opening remarks

Tuesday October 16, 2018 9:00am - 9:05am
Horace Mann

9:05am

What computers can teach us about humans: Machine Learning in marketing
Speakers
avatar for Melinda Han Williams

Melinda Han Williams

VP of Data Science and Analytics, Dstillery
Melinda Han Williams is the VP of Data Science and Analytics at Dstillery. Before joining the ad tech industry, Melinda worked as a physicist developing third generation photovoltaics and studying electronic transport in nanostructured graphene devices. Her peer-reviewed journal publications... Read More →


Tuesday October 16, 2018 9:05am - 9:25am
Horace Mann

9:30am

Keep It Simple, Stupid: Driving Model Adoption Through Tiers
Tech savvy companies like Google, Amazon, and Wayfair are bought into the idea that Machine Learning can drive their product strategy and pricing. However, many data scientists work in companies with less technological expertise and struggle to get approval for the resources to build and implement ML projects. This session covers a tiered approach to model introduction and implementation that focuses on building stakeholder buy-in without abandoning advanced techniques, and a case study to illustrate this approach in practice.

Speakers
avatar for Jamie Warner

Jamie Warner

Director of Advanced Analytics and Data Science, Lincoln Financial
Jamie Warner is a data scientist at Lincoln Financial, where she seeks to revolutionize the way heavily regulated industries understand and adopt data science. She’s led of creation of large scale pricing and reserving models for insurers, and teaches courses to empower stakeholders... Read More →


Tuesday October 16, 2018 9:30am - 9:50am
Horace Mann

10:00am

Open AI for Advertisers: Discover Your Audience
Advertisers are looking for ways to economically use third-party audience segments for customer acquisition rather than audience buying based on intuition. To date, this has not been effective because of the incremental cost of buying audience segments. This talk discusses the Open AI for Advertisers algorithm, which is a scalable, ROI positive third-party audience discovery algorithm that improves the customer acquisition effectiveness by more than 30%. The talk provides an insight into our experiences when designing this algorithm, and how AI helped us solve a classic advertising problem.

Speakers
avatar for Saket Mengle

Saket Mengle

Senior Principal Data Scientist, dataxu
Saket holds a Ph.D. in text mining from Illinois Institute of Technology, Chicago. He has worked in a variety of fields including text classification, information retrieval, large-scale machine learning and linear optimization. He currently works as Senior Principal Data Scientist... Read More →


Tuesday October 16, 2018 10:00am - 10:10am
Horace Mann

10:10am

Style Driven Recommendations @ Wayfair
In this talk, I will explore how we can leverage customer behavior, product imagery, and product attributes to learn a metric space for style.

Speakers
VD

Vinny DeGenova

Data Science Manager, Wayfair
I'm a Data Science Manager at Wayfair leading our Product Recommendations team. Our job is to leverage customer behavior and product information to create a curated, personalized experience for each user on site.


Tuesday October 16, 2018 10:10am - 10:20am
Horace Mann

10:30am

Expo - Networking - Coffee
Meet our exhibitors and connect with fellow attendees over coffee, tea and refreshments.

Tuesday October 16, 2018 10:30am - 11:00am
Expo area

11:00am

AI for Software Testing with Deep Learning: Is it possible?
Using AI for testing software is an emerging field in software engineering. So, how to do it effectively? Yet a big problem to be solved. In this presentation, we will describe how Convolutional Neural Networks (CNN) and Deep Reinforcement Learning can be used for this new endeavor by outlining the challenges, mistakes, and workarounds that we faced to be successful in using AI models to build systems that can really learn from the software they are testing. We will discuss some lessons learned when using pre-trained CNN models, Image Detection APIs and CNN's built from scratch for this purpose.

Speakers
avatar for Emerson Bertolo

Emerson Bertolo

Data Scientist, Stefanini
Data Scientist at Stefanini Rafael Security & Defense Company, with in-depth expertise in data modeling and data extraction in a variety of business scenarios and constraints. For the last 2 years, I deep dived into Machine Learning and Deep Learning by building AI models using a... Read More →


Tuesday October 16, 2018 11:00am - 11:20am
Horace Mann

11:00am

Production Machine Learning with a Domain Specific Language
Many currently available machine learning (ML) platforms focus on algorithms, but gloss over many of the other difficult parts of operating a scalable, production quality ML training and prediction system. We describe a machine learning platform that focuses on abstracting away the most difficult parts of operationalizing ML system including flexible yet performant feature extraction via a custom-designed domain-specific language(DSL), a low-latency model prediction service using ensembles of models in a single prediction, and a model management system for tracking versions of a model over time.

Speakers
avatar for Zachary Kozick

Zachary Kozick

Sr Software Engineer, ML Platform, Nanigans
Zach Kozick is a Sr Software Engineer who has worked at Nanigans for over 5 years. He's made significant contributions to Nanigans' data ingestion and processing infrastructure, and has lead development of their in-house machine learning framework NanML, the subject of his talk... Read More →


Tuesday October 16, 2018 11:00am - 11:20am
Deborah Sampson

11:30am

Genetic Programming in the Real World: A Short Overview
ML is now a commercial and industrial technology, and while many successful algorithms exist there are still areas that require new developments. For instance, one issue with many ML methods is that they generate black-box models. Genetic Programming (GP) generates symbolic models an expressions, that can be used in many different domains. However, GP is not widely used by ML practitioners , it is still mostly an academic tool, but this is changing. This talk will present a short overview of how GP can be used to solve ML tasks, intended as a starting point for applied researchers and developers.

Speakers
avatar for Leonardo Trujillo

Leonardo Trujillo

Research Professor, Instituto Tecnologico de Tijuana
Received a degree in Electronic Engineering (2002) and a Masters in Computer Science (2004) from the Instituto Tecnológico de Tijuana, Mexico. He also received a doctorate in Computer Science from CICESE research center, in Ensenada, Mexico (2008), developing Genetic Programming... Read More →


Tuesday October 16, 2018 11:30am - 11:50am
Horace Mann

11:30am

DevOps for AI Applications
With the booming adoption of AI applications there is a need to better integrate the process of creating, updating and maintaining ML models in a standard Continuous Integration/Continuous Deployment (CI/CD) pipeline. A CI/CD pipeline in software development provides control of releasing the right version to the right environment, ability to rollback in case of an error and ability to manage the process. In this talk, we will walk over the process of automating model operationalization and deployment across different environments from our learnings from customers and internal products.

Speakers
avatar for Richin Jain

Richin Jain

Software Engineer, Microsoft
Richin is a Software Engineer in the Cloud AI team at Microsoft. He focuses on building AI and ML solutions to solve real business problems for enterprise customers in multiple domains. Prior to Microsoft, he worked on data analytics and identity management at Nokia/HERE Technolo... Read More →


Tuesday October 16, 2018 11:30am - 11:50am
Deborah Sampson

12:00pm

Machine Learning Interpretability in the GDPR Era
Despite breakthroughs in statistical performance, the widespread adoption of algorithmic decision making has also led to a rise in “black box” machine learning with unintended negative consequences. In response, attention towards methods that improve the interpretability and understanding of machine learning has also increased. These methods are useful not only for explaining how decisions are made, but also for improving models and ultimately gaining trust in adopting machine learning systems. While using interpretable machine learning methods has innate advantages, the explicit requirement of interpretability has only recently been formalized as part of the General Data Protection Regulation, implemented by the EU as of 25 May 2018. This regulation, which protects the privacy and usage of data of EU citizens, specifically outlines a “right to explanation” in regards to algorithmic decision making. This talk explores the definition of interpretability in machine learning, the trade-offs with complexity and performance, and surveys the major methods used to interpret and explain machine learning models in the context of GDPR compliance.

Speakers
avatar for Gregory Antell

Gregory Antell

Product Manager & Machine Learning Scientist, BigML


Tuesday October 16, 2018 12:00pm - 12:20pm
Horace Mann

12:00pm

Designing Services for Recommending Jobs to College Students
At WayUp, the leading platform for connecting college students and young professionals to internships, part-time jobs, and entry-level roles, our technology recommends job listings and other content to users immediately after they join the site and create a profile. In this talk, I discuss how constraints from this business model and site design are reflected in our technical design for job and content recommendation systems. I'll cover separation of concerns, API design, data flow, batch and real-time processes, DevOps, metrics, recommender algorithms, and the impact on user engagement.

Speakers
avatar for Harlan D. Harris

Harlan D. Harris

Director of Data Science, WayUp
Harlan Harris has a PhD in Computer Science/Machine Learning from the University of Illinois, and worked as a Cognitive Psychology researcher before turning to industry. He is currently Director of Data Science at WayUp, has worked at Kaplan Test Prep, the Advisory Board Company... Read More →


Tuesday October 16, 2018 12:00pm - 12:20pm
Deborah Sampson

12:30pm

Expo - Networking - Lunch
Lunch won't be provided. You can grab something to eat with fellow attendees at the Microsoft cafeteria, next door at the MIT Sloan School cafeteria, or near Kendall Square.

Tuesday October 16, 2018 12:30pm - 2:00pm
Expo area

2:00pm

Facial Recognition Adversarial Attacks, Policy and Choice
What are the policy and societal implications of the unprecedented capability for automated, real time identification and tracking of individuals? What tools exist or could exist for registering and enforcing user choice? What should public policy be around government and private use of biometric data? We demonstrate the technical feasibility of facial recognition adversarial attacks, describe our FOIA request about federal use of facial recognition at airports and borders and invite community discussion and technical contributions to our open sourced prototype.

Speakers
avatar for Gretchen Greene

Gretchen Greene

Greene Strategy and Analytics/ MIT Media Lab
Gretchen Greene, founder and CEO of Greene Strategy and Analytics, is a computer vision scientist, machine learning engineer and lawyer advising governments and private companies on AI use, strategy and policy. Greene has been interviewed by Forbes China, the Economist and the BBC... Read More →


Tuesday October 16, 2018 2:00pm - 2:20pm
Horace Mann

2:00pm

How to Go From Data Science to Data Operations
According to Google developers, "Only a small fraction of real-world ML systems are composed of the ML code. The required surrounding infrastructure is vast and complex."  By focusing on DataOps your teams will be able to deliver faster, with higher quality, using the tools that they love. The topics covered will include:
  • Data science challenges and DataOps definitions.
  • The four As of DataOps
    • Automate and monitor pipelines
    • Automate deployments
    • Automate and monitor quality
    • Automate sandboxes

Speakers
avatar for Gil Benghiat

Gil Benghiat

Founder, VP Products, DataKitchen
Gil Benghiat, co-founder of DataKitchen, a company on a mission to enable analytic teams to deliver value quickly and with high quality using the tools they love. Gil's‚ career has always been data-oriented starting with network data at AT&T Bell Laboratories, any data at Sybase... Read More →


Tuesday October 16, 2018 2:00pm - 2:20pm
Deborah Sampson

2:30pm

Developing and Deploying ML Algorithms in a Clinical Setting
The development and deployment of machine learning models is fraught with complexity exceeding that which is typically found in traditional software. Operating in a clinical environment introduces further difficulties that must be resolved to produce a successful product. In this talk, we will discuss the challenges of applying machine learning to medical imaging and highlight potential solutions to these problems.

Speakers
avatar for Neil Tenenholtz

Neil Tenenholtz

Director of Machine Learning, MGH & BWH Center for Clinical Data Science
Neil Tenenholtz is the Director of Machine Learning at the MGH & BWH Center for Clinical Data Science, where his responsibilities include the training of novel deep learning models for clinical diagnosis and the development of robust infrastructure for their deployment in the clinical... Read More →


Tuesday October 16, 2018 2:30pm - 2:50pm
Horace Mann

2:30pm

A Config-based Framework for Productionized Machine Learning
For a data scientist, building correct models quickly and moving them to production safely can be challenging. Especially with messy data, catching and debugging data quality issues in model-building is also challenging. This talk will discuss the "Clover transform framework", a config-based production machine learning framework to allow easy and fast generation of features and models, for fast iteration speeds, easy parameter tuning, algorithm choice, monitoring, and auditability, using the same config for both development iteration and running in production.

Speakers
avatar for Melanie Goetz

Melanie Goetz

Head of Machine Learning, Clover Health
Melanie runs the Machine Learning team at Clover Health, a health insurance startup in San Francisco that uses machine learning to keep members healthy and out of the hospital. She studied Linguistics and Math/CS at MIT and ML at UPenn. Previously, she worked in Japan on machine translation... Read More →


Tuesday October 16, 2018 2:30pm - 2:50pm
Deborah Sampson

3:00pm

The Right Amount of Trust for AI
The key to building systems that are integrated into people’s lives is trust. If you don’t have the right amount of trust, you open the system up to disuse and misuse. We will discuss the building blocks of AI from a product/design perspective, what trust is, how trust is gained, and maybe more importantly lost, and techniques you can use day-to-day to build trusted AI products. We will reference real examples from academia, industry, and my work at Philosophie.

Find the presentation here:

https://goo.gl/B2TQt6

Speakers
avatar for Chris Butler

Chris Butler

Chief Product Architect, IPsoft
Chris Butler is IPSoft's Chief Product Architect. Chris has over 19 years of product and business development experience at companies like Microsoft, KAYAK, and Waze. He first got introduced to AI through graph theory and genetic algorithms during his Computer Systems Engineering... Read More →


Tuesday October 16, 2018 3:00pm - 3:20pm
Horace Mann

3:00pm

Unpredictable predictions of self-driving cars AI
No matter how good your Machine Learning model is trained, the inference output space leaves a wide range for appearing irrelevant and unexpected results when real world gives a model an unforeseen challenge. Those error inferences may lead to accidental outcomes, there are notorious cases we all know. 
The solution is robust monitoring for the edge cases and implementing the Active Learning concept into businesses' AI/ML operations for those cases to be handled and learned.
The talk will be dedicated to practical solutions and their implementation into business operations.

Speakers
avatar for Iskandar Sitdikov

Iskandar Sitdikov

ML/Software engineer, Hydrosphere.io
Iskandar Sitdikov is Hydrosphere.io ML engineer with rich practical background both in Machine Lerarning and Big Data fields. His latest tasks lie in the area of research and prototyping data anomalies and concept drifts detection methods in ML production.


Tuesday October 16, 2018 3:00pm - 3:20pm
Deborah Sampson

3:30pm

Expo - Networking - Coffee
Meet our exhibitors and connect with fellow attendees over coffee, tea and refreshments.

Tuesday October 16, 2018 3:30pm - 4:00pm
Expo area

4:00pm

The secret life of predictive models
In the context of building predictive models, predictability is usually considered a blessing. After all – that is the goal: build the model that has the highest predictive performance. The rise of ‘big data’ has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for very different reasons. In such mixed scenarios, the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of applications where this happens: clicks on ads being performed ‘intentionally’ vs. ‘accidentally’, consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have significantly negative impact on the usefulness of predictive modeling and potentially create second order biased in the predictions.

Speakers
avatar for Claudia Perlich

Claudia Perlich

SVP & Senior Data Scientist, Two Sigma
Claudia Perlich is a Senior Data Scientist at Two Sigma in New York City. Prior to her role at Two Sigma, she was the Chief Scientist at Dstillery where she designed, developed, analyzed, and optimized machine learning that drives digital advertising. She started her career in Data... Read More →


Tuesday October 16, 2018 4:00pm - 4:40pm
Horace Mann

4:50pm

Startup Pitches
Find out about the most promising startups in AI and ML: PAPIs hosts the world's 1st startup competition (powered by PreSeries), where participants are judged by an AI on stage—not a human jury!

  • Acciyo - Anum Hussain (anum@acciyo.com) & Vivian Diep (vivian@acciyo.com)
  • WorkAround - Jennie Kelly (jkelly@workaround.online)


Tuesday October 16, 2018 4:50pm - 5:00pm
Horace Mann

5:00pm

AI Startup Battle
Find out about the most promising startups in AI and ML: PAPIs hosts the world's 1st startup competition (powered by PreSeries), where participants are judged by an AI on stage—not a human jury!

Tuesday October 16, 2018 5:00pm - 5:30pm
Horace Mann

5:30pm

Drinks!
Stay around in the evening — drinks are on us!

Tuesday October 16, 2018 5:30pm - 7:30pm
Expo area
 
Wednesday, October 17
 

8:15am

Welcome & Registration
Doors open at 8.15am. We strongly recommend to arrive early in order to have time to go through security, get your PAPIs badge, choose your favorite t-shirt (we have some really cool and exclusive designs, but limited sizes), meet fellow attendees, exhibitors, and find a good seat in the auditorium. We’ll also have breakfast snacks, fruit, tea, coffee and refreshments!

Wednesday October 17, 2018 8:15am - 9:00am
Expo area

9:00am

Designing automated pipelines for unseen custom data
Machine Learning applications at Salesforce use a wide variety of customer data that is highly customizable. In this talk, I discuss some challenges of designing automated machine learning pipelines that can deal with custom user data that it has never seen before, as well as some of our solutions. Examples include statistical tests between training and scoring data sets to help with the cold start problem, algorithms to throw out features that are "too good" because they are derived from the label we're trying to predict, and data-dependent feature engineering steps like automatically determining buckets for numeric variables and detecting categorical variables encoded as other data types.

Speakers
avatar for Kevin Moore

Kevin Moore

Sr. Data Scientist, Salesforce
Kevin is a senior data scientist at Salesforce where he works on automated machine learning pipelines to generate and deploy customized models for a wide variety of customers and use cases. He has a PhD in astrophysics and prior to becoming a data scientist he worked on modeling how... Read More →


Wednesday October 17, 2018 9:00am - 9:20am
Horace Mann

9:30am

Migrating ML from research to production
As the field of machine learning matures, the industry is looking to adopt it in production software. This places several challenges on a research-focused field, without a lot of expertise available. In this talk, we'll share our experience leveraging research to production settings, including our Autonomous Vehicle efforts. We'll present the major issues faced by developers and the main techniques that contributed to establish stable production for research.

Speakers
avatar for Conrado Silva Miranda

Conrado Silva Miranda

Eng. Manager, NVIDIA
Conrado is an Eng. Manager for Deep Learning Engineering at NVIDIA, where he focuses on creating a platform to transfer research to production. Before NVIDIA, he was the tech lead and architect for DeepBird v2, Twitter's deep learning platform. Before making a permanent jump to industry... Read More →


Wednesday October 17, 2018 9:30am - 9:50am
Horace Mann

10:00am

Lightning Talks
Surface and Interface — Alan Laidlaw, Home Depot
When the USAF discovered that their planes were crashing because the cockpits weren't shaped for pilots, they went back to the data. They found that the initial specs for pilot dimensions didn't fit a single pilot. The data wasn't wrong, there just wasn't an average pilot. So they told the plane manufacturers to fix it.
The manufacturers came back with adjustable seats. They turned what was previously a surface into an interface.
In order to apply ML, data science and UX need to overlap. This talk explores how to do that in the enterprise.

Gallery Wall Synthesis and Visualization in AR — Tomer Weiss, Wayfair
We present a design tool which allows users to create a gallery wall design, and visualize it in AR. To use our tool, a user selects the wall area to decorate, and an art piece. Then, our system automatically generates a recommendation based on art sampled from a database. The system takes into account common design criteria. With a mobile phone, the user can instantly visualize the resulting layout placed on a wall, browse wall art pieces, and generate new layout suggestions. We demonstrate how our tool can be applied for creating different wall art layouts for different decoration scenarios.

A fashion-aware fashion recommender — Amy Winecoff, Data Scientist at True Fit
This presentation introduces a style recommender for clothing and footwear. Fashion recommendation problems are characterized by sparse user/product interactions, large catalogs of styles, and many new users and products, i.e., the cold start problem. We present a hybrid recommender system that leverages a collaborative filter to decrease data sparsity prior to training a supervised learning algorithm based on user and product features. This addresses data sparsity issues and produces recommendations that are personal and do not require previous user product interactions.

State of AI in Africa — Matt Grasser, Director of Inclusive FinTech at Bankable Frontier Associates
Earlier this year, BFA launched the first-ever report on artificial intelligence as it relates to financial services in Africa and the low-income customer. For financial services providers in Africa, the real value of AI and its forms, such as machine learning, lies in practical applications that can reduce the cost to serve and acquire customers -- creating more viable business models for broader segments of the market. Several financial services providers are already using AI to eliminate business inefficiencies, manage business and customer risk and create more seamless customer experiences. More and more providers, not just the large players, have the opportunity to do the same.

Wednesday October 17, 2018 10:00am - 10:30am
Horace Mann

10:30am

Expo - Networking - Coffee
Meet our exhibitors and connect with fellow attendees over coffee, tea and refreshments.

Wednesday October 17, 2018 10:30am - 11:00am
Expo area

11:00am

Putting five ML models to production in five minutes
They say that model deployment is the most challenging step in a data science workflow.

The aim of this presentation is to prove them wrong. We will be training five ML models using five different ML frameworks (R, Scikit-Learn, Apache Spark, H2O.ai and XGBoost) and making a unified production-grade cloud deployment - everything in under five minutes!

The workflow is based on free and open-source software. Attendants will be enticed to replicate the workflow and join the model deployment-fest while the clock is ticking.

Presentation's homepage (README and all scripts) is located at https://github.com/openscoring/papis.io

Speakers
avatar for Villu Ruusmann

Villu Ruusmann

CTO, Openscoring OÜ
Founder and CTO of Openscoring.IO


Wednesday October 17, 2018 11:00am - 11:10am
Deborah Sampson

11:00am

Monitoring AI with AI
Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance.
Common production incidents include:
- Data drifts, new data, wrong features
- Vulnerability issues, adversarial attacks
- Concept drifts, new concepts, expected model degradation
- Dramatic unexpected drifts
- Biased Training set / training issue
- Performance issue
In this talk we'll discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines.

Speakers
avatar for Iskandar Sitdikov

Iskandar Sitdikov

ML/Software engineer, Hydrosphere.io
Iskandar Sitdikov is Hydrosphere.io ML engineer with rich practical background both in Machine Lerarning and Big Data fields. His latest tasks lie in the area of research and prototyping data anomalies and concept drifts detection methods in ML production.


Wednesday October 17, 2018 11:00am - 11:20am
Horace Mann

11:10am

Crowdsourced Human Intelligence in AI
WorkAround is an online platform for Human Intelligence Tasks similar to Amazon's Mechanical Turk, but providing these jobs to refugees and displaced people. The demo will include a tour of the platform and services offered but will also attempt to bring attention to the limitations and possibilities of crowdsourced human tagged data and why it is important to consider the social impact outsourcing this kind of work can have.

Speakers
avatar for Jennie Kelly

Jennie Kelly

COO, WorkAround
Jennie Kelly is the Director of Operations at WorkAround, a crowdsourcing platform for gathering, tagging, and scrubbing, data for machine learning and AI training needs. She has worked on four different continents in 6 different industries and specializes in helping non-dataphiles... Read More →


Wednesday October 17, 2018 11:10am - 11:20am
Deborah Sampson

11:20am

Beyond the Model: Operationalizing 4,586 Bigfoot Sightings
Bigfoot has been a staple of American folklore since the 19th century. Many people are convinced that Bigfoot is real. Others suggest that he is a cultural phenomenon. Some just want to believe. There is even a group, the Bigfoot Field Researchers Organization, that tracks Bigfoot sightings. And they have thousands of reports available online that date back to the late 19th century.

The Internet, it seems, has everything.

So, I took this data, all 4,586 records of it, and used it to build a classifier. It was a good model with pleasing metrics. I liked it. But then what? For some folks, the model is where the work ends. But I'm a developer and that's only half the solution. I've got a model but how do I use it? How do I put it in an application so that a user can, well, use it?

I'm going to answer that question in this talk, and a bit more. I'll show you how I exposed my Bigfoot classifier to the Internet as a REST-based API written in Python. And we'll tour a couple of applications I wrote to use that API: a web-based application written in JavaScript and an iOS application written in Swift. For the model itself, I'll use DataRobot since it's quick and easy. And, I work there!

When we're done, you'll know how to incorporate a model into an API of your own and how to use that API from your application. And, since all my code is on GitHub, you'll have some examples you can use for your own projects. As a bonus, you'll have 4,586 Bigfoot sightings to play with. And who doesn't want that?


Speakers
avatar for Guy Royse

Guy Royse

Developer Evangelist, DataRobot


Wednesday October 17, 2018 11:20am - 11:40am
Deborah Sampson

11:30am

Reasoning About Uncertainty at Scale
Freebird models US domestic flights in a way that captures uncertainty at every step. We present a case study of using Bayesian modelling and inference to directly model behavior of aircraft arrivals and departures, focusing on the uncertainty in those predictions. Along the way we will discuss theoretical considerations, highlighting what can go wrong, while emphasizing practical implications around scaling to large data sets.

Speakers
ML

Max Livingston

Data Scientist, Freebird
Max Livingston is a data scientist at Freebird, where he uses Bayesian machine learning techniques to model flight disruptions and last-minute prices. He graduated from Wesleyan University with high honors in Economics and worked in the Research group of the New York Fed before making... Read More →


Wednesday October 17, 2018 11:30am - 11:50am
Horace Mann

11:45am

QuSandbox: A lifecycle based approach to model governance
We present QuSandbox, a end-to-end workflow based system to enable creation and deployment of data science workflows within the enterprise. Our environment supports AWS and Google Cloud and incorporates model and data provenance throughout the model life development.As cloud computing and ML adoption becomes a norm in the enterprise we envision AI and MLOps become important to ensure robust adoption and deployment within the enterprise.
In this talk we will illustrate a ten-step process to enable replicable research within the enterprise using QuSandbox.

Speakers
avatar for Sri Krishnamurthy

Sri Krishnamurthy

Chief Data Scientist, QuantUniversity
Sri Krishnamurthy is the founder of QuantUniversity.com, a data and quantitative analysis company and the creator of the Analytics Certificate program (www.analyticscertificate.com). He has more than 15 years experience in analytics, quantitative analysis, statistical modeling and... Read More →


Wednesday October 17, 2018 11:45am - 11:55am
Deborah Sampson

11:55am

Automated Data Preparation and Machine Learning
Automated Data Preparation and Machine learning is a paradigm shift that accelerates data scientists’ productivity. Tasks like data pre-processing, cleansing, feature selection, model selection and hyper-parameter tuning can be automated to a certain extent with an Automated Machine Learning framework. The goal is to improve the productivity of data scientists, enable business analysts or citizen data scientists to evaluate a complete business case and use predictive modeling to answer their business problems within a few hours.

Speakers
avatar for Lars Bauerle

Lars Bauerle

Chief Product Officer, RapidMiner
avatar for Pavithra Rao

Pavithra Rao

Pre-Sales Data Scientist, RapidMiner
Pavithra Rao, Pre-Sales Engineer at RapidMiner Inc. | Pavithra has Master Degree in Data Science major from UConn School of Business. Prior to this, she worked as Solution Advisor at Deloitte and Touche for 2 years helping clients in major industrial sectors, identify and address... Read More →


Wednesday October 17, 2018 11:55am - 12:05pm
Deborah Sampson

12:00pm

Creating Robust Interpretable NLP Systems with Attention
In order to build robust NLP models that can reason from language well, architectures should function more similarly to how our human brains work over pure pattern recognition. Attention is an interpretable type of neural network layer that is loosely based on attention in humans, and it has recently enabled a powerful alternative to RNNs. Attention-based models have produced new techniques and state of the art performances for many language modeling tasks. In this presentation, an introduction to Attention layers will be given along with why and how they have been utilized to revolutionize NLP.

Speakers
avatar for Alexander Wolf

Alexander Wolf

Data Scientist, Dataiku
Alex is a Data Scientist at Dataiku, working with clients around the world to organize their data infrastructures and deploy data-driven products into production. Prior to that, he worked on software and business development in the tech industry and studied Computer Science and Statistics... Read More →


Wednesday October 17, 2018 12:00pm - 12:20pm
Horace Mann

12:05pm

QuickCode: Label your training data fast and transparently
The uses for machine learning are growing at an unprecedented rate. Yet as machine learning applications advance, the lack of good, labeled training data is inhibiting their large-scale adoption. We teach how to generate labeled training data quickly and transparently using QuickCode, a system that recommends labels for text data. Each time the user provides more information, QuickCode rapidly iterates with the user and provides better recommendations. In tests against data that were hand-labeled, QuickCode matched the hand labels with 95% accuracy at a fraction of the time the hand-coding took.

Speakers
avatar for Patrick Lam

Patrick Lam

Lead Data Scientist, Thresher
Patrick Lam is Lead Data Scientist at Thresher and a Visiting Fellow at the Harvard Institute for Quantitative Social Science. He received his Ph.D. in Political Science and Masters in Statistics from Harvard University in 2013. He has worked on problems involving machine learning... Read More →


Wednesday October 17, 2018 12:05pm - 12:15pm
Deborah Sampson

12:30pm

Expo - Networking - Lunch
Lunch won't be provided. You can grab something to eat with fellow attendees at the Microsoft cafeteria, next door at the MIT Sloan School cafeteria, or near Kendall Square.

Wednesday October 17, 2018 12:30pm - 2:00pm
Expo area

2:00pm

Architectures for big scale 2D imagery
I will present research that I conducted during my Ph.D. at University College London and in collaboration with Google. My primary interest lays in the development of neural architectures for 2D imagery problems in big scale. Will present the recently published analysis of different upsampling methods in the decoder part of visual architectures, together with last week ongoing extension for GANs. Will discuss attention mechanism for text recognition and review for what kind of application it can be useful (automatically updating Google Maps based on Google Street View imagery).

Speakers
avatar for Zbigniew Wojna

Zbigniew Wojna

Founder, Tensorflight
Zbigniew Wojna is deep learning researcher and founder of TensorFlight Inc. company providing instant remote commercial property inspection (for risk factors for reinsurance enterprises) based on satellite and street view type imagery. Zbigniew is currently in the final stage of his... Read More →


Wednesday October 17, 2018 2:00pm - 2:20pm
Horace Mann

2:00pm

What's inside the box: comparing data across ML platforms
Protocol Buffer. HDF5. TFLite. Pickle. They all hold model weights and/or code, and in many cases they can and do hold the same ones. Understanding how they work and how to unpack them outside of our tools gives us insights into how the tools themselves work, and how we can pipe our models around, and make them more useful for our customers - often app developers!

Speakers
avatar for Ray Deck

Ray Deck

Co-founder, Privilege Inc
Ray is CTO of Element55 in Cambridge, MA. He has been a quantitative analyst since 1995, a software developer since 1998, and a founder since 2003. He cannot write backwards on a whiteboard, disqualifying him as a true data scientist. Nevertheless, he has been learning and teaching... Read More →


Wednesday October 17, 2018 2:00pm - 2:30pm
Deborah Sampson

2:30pm

Would you have clicked on what we would have recommended?
In this talk, we describe recent work on the offline estimation of recommender system A/B tests using counterfactual reasoning techniques. We can determine whether our customers would have clicked on what we would have recommended by adding stochasticity to our recommendations. This ensures non-zero probability of having shown our new recommendations at some point in the past, which can leverage using a technique known as Pareto-smoothed importance sampling. This allows us to create a low-bias, low-variance estimator of how our recommender systems would have performed had they been deployed.

Speakers
avatar for Peter B. Golbus

Peter B. Golbus

Senior Data Scientist, Wayfair
Peter B. Golbus is a Senior Data Scientist at Wayfair. Peter joined Wayfair directly from his Ph.D. program at Northeastern University where he studied the offline evaluation of search engines with Javed A. Aslam. Four years later, he is still at Wayfair, and is now studying the offline... Read More →


Wednesday October 17, 2018 2:30pm - 2:50pm
Horace Mann

2:40pm

BigML demo
Speakers
avatar for Poul Petersen

Poul Petersen

CIO, BigML
Poul Petersen is the Chief Infrastructure Officer at BigML. He has an MS degree in Mathematics as well as BS degrees in Mathematics, Physics and Engineering Physics. With 20 plus years of experience building scalable and fault tolerant systems in data centers, Poul currently enjoys the benefits of programmatic infrastructure, hacking in... Read More →


Wednesday October 17, 2018 2:40pm - 2:50pm
Deborah Sampson

2:50pm

Unifying Microsoft's ML Ecosystems at Massive Scales with MMLSpark
In this talk we will explore Microsoft Machine Learning for Apache Spark (MMLSpark), an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning, Micro-Service Orchestration, Gradient Boosting, Model Interpret-ability, and other areas of modern computation. We also present a novel system called Spark Serving that allows users to run any Apache Spark program as a distributed, sub-millisecond latency web service backed by their existing Spark Cluster. We apply this ecosystem to create deep object detectors capable of learning without human labeled data and demonstrate its effectiveness for Snow Leopard conservation.  

Speakers
avatar for Mark Hamilton

Mark Hamilton

Microsoft


Wednesday October 17, 2018 2:50pm - 3:00pm
Deborah Sampson

3:00pm

Expo - Networking - Coffee
Meet our exhibitors and connect with fellow attendees over coffee, tea and refreshments.

Wednesday October 17, 2018 3:00pm - 3:30pm
Expo area

3:30pm

Unintended consequences of AI — panel discussion
Moderators
Speakers
avatar for Gretchen Greene

Gretchen Greene

Greene Strategy and Analytics/ MIT Media Lab
Gretchen Greene, founder and CEO of Greene Strategy and Analytics, is a computer vision scientist, machine learning engineer and lawyer advising governments and private companies on AI use, strategy and policy. Greene has been interviewed by Forbes China, the Economist and the BBC... Read More →
avatar for Iskandar Sitdikov

Iskandar Sitdikov

ML/Software engineer, Hydrosphere.io
Iskandar Sitdikov is Hydrosphere.io ML engineer with rich practical background both in Machine Lerarning and Big Data fields. His latest tasks lie in the area of research and prototyping data anomalies and concept drifts detection methods in ML production.
avatar for Jamie Warner

Jamie Warner

Director of Advanced Analytics and Data Science, Lincoln Financial
Jamie Warner is a data scientist at Lincoln Financial, where she seeks to revolutionize the way heavily regulated industries understand and adopt data science. She’s led of creation of large scale pricing and reserving models for insurers, and teaches courses to empower stakeholders... Read More →


Wednesday October 17, 2018 3:30pm - 4:00pm
Horace Mann

4:00pm

Solving the Data Science Time Series Forecasting Challenge
Most machine learning algorithms today are not time-aware nor easily applied to time series and forecasting problems. Leveraging advanced algorithms like XGBoost or linear models typically requires substantial data preparation and feature engineering.

This presentation will cover the best practices for solving this challenge by introducing a general framework for developing time series models, generating features and preprocessing the data, and exploring the potential to automate this process in order to apply advanced machine learning algorithms to almost any time series problem.

Speakers
avatar for Michael Schmidt

Michael Schmidt

Chief Data Scientist, DataRobot
Michael is chief data scientist at DataRobot where he works on algorithms and techniques to automate machine learning processes. His research has appeared in the New York Times, NPR's RadioLab, and Communications of the ACM. Michael also created the Eureqa project, a software program... Read More →


Wednesday October 17, 2018 4:00pm - 4:20pm
Horace Mann