PAPIs 2018 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Tools [clear filter]
Wednesday, October 17

11:00am EDT

Putting five ML models to production in five minutes
They say that model deployment is the most challenging step in a data science workflow.

The aim of this presentation is to prove them wrong. We will be training five ML models using five different ML frameworks (R, Scikit-Learn, Apache Spark, H2O.ai and XGBoost) and making a unified production-grade cloud deployment - everything in under five minutes!

The workflow is based on free and open-source software. Attendants will be enticed to replicate the workflow and join the model deployment-fest while the clock is ticking.

Presentation's homepage (README and all scripts) is located at https://github.com/openscoring/papis.io

avatar for Villu Ruusmann

Villu Ruusmann

CTO, Openscoring OÜ
Founder and CTO of Openscoring.IO

Wednesday October 17, 2018 11:00am - 11:10am EDT
Deborah Sampson

11:10am EDT

Crowdsourced Human Intelligence in AI
WorkAround is an online platform for Human Intelligence Tasks similar to Amazon's Mechanical Turk, but providing these jobs to refugees and displaced people. The demo will include a tour of the platform and services offered but will also attempt to bring attention to the limitations and possibilities of crowdsourced human tagged data and why it is important to consider the social impact outsourcing this kind of work can have.

avatar for Jennie Kelly

Jennie Kelly

COO, WorkAround
Jennie Kelly is the Director of Operations at WorkAround, a crowdsourcing platform for gathering, tagging, and scrubbing, data for machine learning and AI training needs. She has worked on four different continents in 6 different industries and specializes in helping non-dataphiles... Read More →

Wednesday October 17, 2018 11:10am - 11:20am EDT
Deborah Sampson

11:20am EDT

Beyond the Model: Operationalizing 4,586 Bigfoot Sightings
Bigfoot has been a staple of American folklore since the 19th century. Many people are convinced that Bigfoot is real. Others suggest that he is a cultural phenomenon. Some just want to believe. There is even a group, the Bigfoot Field Researchers Organization, that tracks Bigfoot sightings. And they have thousands of reports available online that date back to the late 19th century.

The Internet, it seems, has everything.

So, I took this data, all 4,586 records of it, and used it to build a classifier. It was a good model with pleasing metrics. I liked it. But then what? For some folks, the model is where the work ends. But I'm a developer and that's only half the solution. I've got a model but how do I use it? How do I put it in an application so that a user can, well, use it?

I'm going to answer that question in this talk, and a bit more. I'll show you how I exposed my Bigfoot classifier to the Internet as a REST-based API written in Python. And we'll tour a couple of applications I wrote to use that API: a web-based application written in JavaScript and an iOS application written in Swift. For the model itself, I'll use DataRobot since it's quick and easy. And, I work there!

When we're done, you'll know how to incorporate a model into an API of your own and how to use that API from your application. And, since all my code is on GitHub, you'll have some examples you can use for your own projects. As a bonus, you'll have 4,586 Bigfoot sightings to play with. And who doesn't want that?

avatar for Guy Royse

Guy Royse

Developer Evangelist, DataRobot

Wednesday October 17, 2018 11:20am - 11:40am EDT
Deborah Sampson

11:45am EDT

QuSandbox: A lifecycle based approach to model governance
We present QuSandbox, a end-to-end workflow based system to enable creation and deployment of data science workflows within the enterprise. Our environment supports AWS and Google Cloud and incorporates model and data provenance throughout the model life development.As cloud computing and ML adoption becomes a norm in the enterprise we envision AI and MLOps become important to ensure robust adoption and deployment within the enterprise.
In this talk we will illustrate a ten-step process to enable replicable research within the enterprise using QuSandbox.

avatar for Sri Krishnamurthy

Sri Krishnamurthy

Chief Data Scientist, QuantUniversity
Sri Krishnamurthy is the founder of QuantUniversity.com, a data and quantitative analysis company and the creator of the Analytics Certificate program (www.analyticscertificate.com). He has more than 15 years experience in analytics, quantitative analysis, statistical modeling and... Read More →

Wednesday October 17, 2018 11:45am - 11:55am EDT
Deborah Sampson

11:55am EDT

Automated Data Preparation and Machine Learning
Automated Data Preparation and Machine learning is a paradigm shift that accelerates data scientists’ productivity. Tasks like data pre-processing, cleansing, feature selection, model selection and hyper-parameter tuning can be automated to a certain extent with an Automated Machine Learning framework. The goal is to improve the productivity of data scientists, enable business analysts or citizen data scientists to evaluate a complete business case and use predictive modeling to answer their business problems within a few hours.

avatar for Lars Bauerle

Lars Bauerle

Chief Product Officer, RapidMiner
avatar for Pavithra Rao

Pavithra Rao

Pre-Sales Data Scientist, RapidMiner
Pavithra Rao, Pre-Sales Engineer at RapidMiner Inc. Pavithra has Master Degree in Data Science major from UConn School of Business. Prior to this, she worked as Solution Advisor at Deloitte and Touche for 2 years helping clients in major industrial sectors, identify and address Technology... Read More →

Wednesday October 17, 2018 11:55am - 12:05pm EDT
Deborah Sampson

12:05pm EDT

QuickCode: Label your training data fast and transparently
The uses for machine learning are growing at an unprecedented rate. Yet as machine learning applications advance, the lack of good, labeled training data is inhibiting their large-scale adoption. We teach how to generate labeled training data quickly and transparently using QuickCode, a system that recommends labels for text data. Each time the user provides more information, QuickCode rapidly iterates with the user and provides better recommendations. In tests against data that were hand-labeled, QuickCode matched the hand labels with 95% accuracy at a fraction of the time the hand-coding took.

avatar for Patrick Lam

Patrick Lam

Lead Data Scientist, Thresher
Patrick Lam is Lead Data Scientist at Thresher and a Visiting Fellow at the Harvard Institute for Quantitative Social Science. He received his Ph.D. in Political Science and Masters in Statistics from Harvard University in 2013. He has worked on problems involving machine learning... Read More →

Wednesday October 17, 2018 12:05pm - 12:15pm EDT
Deborah Sampson

2:00pm EDT

What's inside the box: comparing data across ML platforms
Protocol Buffer. HDF5. TFLite. Pickle. They all hold model weights and/or code, and in many cases they can and do hold the same ones. Understanding how they work and how to unpack them outside of our tools gives us insights into how the tools themselves work, and how we can pipe our models around, and make them more useful for our customers - often app developers!

avatar for Ray Deck

Ray Deck

Co-founder, Privilege Inc
Ray is CTO of Element55 in Cambridge, MA. He has been a quantitative analyst since 1995, a software developer since 1998, and a founder since 2003. He cannot write backwards on a whiteboard, disqualifying him as a true data scientist. Nevertheless, he has been learning and teaching... Read More →

Wednesday October 17, 2018 2:00pm - 2:30pm EDT
Deborah Sampson

2:40pm EDT

BigML demo
avatar for Poul Petersen

Poul Petersen

Poul Petersen is the Chief Infrastructure Officer at BigML. He has an MS degree in Mathematics as well as BS degrees in Mathematics, Physics and Engineering Physics. With 20 plus years of experience building scalable and fault tolerant systems in data centers, Poul currently enjoys the benefits of programmatic infrastructure, hacking in... Read More →

Wednesday October 17, 2018 2:40pm - 2:50pm EDT
Deborah Sampson

2:50pm EDT

Unifying Microsoft's ML Ecosystems at Massive Scales with MMLSpark
In this talk we will explore Microsoft Machine Learning for Apache Spark (MMLSpark), an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning, Micro-Service Orchestration, Gradient Boosting, Model Interpret-ability, and other areas of modern computation. We also present a novel system called Spark Serving that allows users to run any Apache Spark program as a distributed, sub-millisecond latency web service backed by their existing Spark Cluster. We apply this ecosystem to create deep object detectors capable of learning without human labeled data and demonstrate its effectiveness for Snow Leopard conservation.  

avatar for Mark Hamilton

Mark Hamilton


Wednesday October 17, 2018 2:50pm - 3:00pm EDT
Deborah Sampson

Twitter Feed