Skip to main content

Data Science Challenge: Predict Restaurant Health Scores with Yelp Data



Yelp connects people with local businesses and along the way we’ve gathered rich data about customers’ experiences at those businesses via reviews, tips, check-ins and business attributes. We are constantly asking ourselves how the collective wisdom of Yelpers can be used to help society. A couple years ago we began working with cities to share restaurant health scores on Yelp, but that’s just the beginning. Could we use Yelp’s reviews and business information to make the process of sending Health Inspectors to restaurants more efficient? We think so and are challenging data scientists worldwide to design a health inspection prediction algorithm, using Yelp data.

Yelp is co-sponsoring a new Data Science contest “Keeping it Fresh“ in collaboration with the City of Boston, DrivenData.org and Harvard University economists (Ed, Andrew, Scott, and Mike). Using Yelp’s data for restaurants, food and nightlife businesses in Boston as well as past history of health inspections, we are asking contestants to predict the future health score that will be assigned to a business at their next health inspection.

Healthscore-1024x360

According to the Centers for Disease Control, more than 48 million Americans per year become sick from food, and an estimated 75% of the outbreaks came from food prepared by caterers, delis, and restaurants. Currently, inspectors are sent to restaurants in a mostly random fashion. Since cities only have a limited number of health inspectors, quite often their time is wasted on spot checks at clean, rule-abiding restaurants. This also means that sometimes restaurants with poor health and safety records are discovered too late.

It turns out that with Yelp’s data, cities can improve the process of assigning Health Inspectors drastically. A research study by Professor Michael Luca from Harvard Business School and Professor Yejin Choi from Stony Brook University and their graduate students found that a model built using Yelp’s reviews data and past health inspection records is able to successfully predict future inspection scores for restaurants 82 percent of the time.

So the gauntlet has been thrown. Data scientists of the world – can you beat 82 percent?

Winning algorithms will be awarded financial prizes — but the real prize is the opportunity to help the City of Boston, which is committed to examining ways to integrate the winning algorithm into its day-to-day inspection operations.

Read about how Yelp engineers have tried to crack this case on the Yelp Engineering Blog, and check out the contest page for all the rules and juicy details. Submissions close on July 7, 2015. 

[gravityform id="4" title="false" ajax="true"]
<script type="text/javascript">var gform;gform||(document.addEventListener("gform_main_scripts_loaded",function(){gform.scriptsLoaded=!0}),window.addEventListener("DOMContentLoaded",function(){gform.domLoaded=!0}),gform={domLoaded:!1,scriptsLoaded:!1,initializeOnLoaded:function(o){gform.domLoaded&&gform.scriptsLoaded?o():!gform.domLoaded&&gform.scriptsLoaded?window.addEventListener("DOMContentLoaded",o):document.addEventListener("gform_main_scripts_loaded",o)},hooks:{action:{},filter:{}},addAction:function(o,n,r,t){gform.addHook("action",o,n,r,t)},addFilter:function(o,n,r,t){gform.addHook("filter",o,n,r,t)},doAction:function(o){gform.doHook("action",o,arguments)},applyFilters:function(o){return gform.doHook("filter",o,arguments)},removeAction:function(o,n){gform.removeHook("action",o,n)},removeFilter:function(o,n,r){gform.removeHook("filter",o,n,r)},addHook:function(o,n,r,t,i){null==gform.hooks[o][n]&&(gform.hooks[o][n]=[]);var e=gform.hooks[o][n];null==i&&(i=n+"_"+e.length),gform.hooks[o][n].push({tag:i,callable:r,priority:t=null==t?10:t})},doHook:function(n,o,r){var t;if(r=Array.prototype.slice.call(r,1),null!=gform.hooks[n][o]&&((o=gform.hooks[n][o]).sort(function(o,n){return o.priority-n.priority}),o.forEach(function(o){"function"!=typeof(t=o.callable)&&(t=window[t]),"action"==n?t.apply(null,r):r[0]=t.apply(null,r)})),"filter"==n)return r[0]},removeHook:function(o,n,t,i){var r;null!=gform.hooks[o][n]&&(r=(r=gform.hooks[o][n]).filter(function(o,n,r){return!!(null!=i&&i!=o.tag||null!=t&&t!=o.priority)}),gform.hooks[o][n]=r)}});</script> <div class='gf_browser_unknown gform_wrapper gform_legacy_markup_wrapper' id='gform_wrapper_4' ><div id='gf_4' class='gform_anchor' tabindex='-1'></div> <div class='gform_heading'> <span class='gform_description'></span> </div><form data-form-name='Newsletter Signup Popup' method='post' enctype='multipart/form-data' target='gform_ajax_frame_4' id='gform_4' action='/news/data-science-challenge-predict-restaurant-health-scores-with-yelp-data/#gf_4' novalidate> <div class='gform_body gform-body'><ul id='gform_fields_4' class='gform_fields top_label form_sublabel_below description_below'><li id="field_4_1" class="gfield field_sublabel_below field_description_below hidden_label gfield_visibility_visible gf-email" data-field-class="gf-email" data-js-reload="field_4_1"><label class='gfield_label screen-reader-text' for='input_4_1' >youremail@address.com</label><div class='ginput_container ginput_container_email'> <input name='input_1' id='input_4_1' type='email' value='' class='large' placeholder='youremail@address.com' aria-invalid="false" /> </div></li><li id="field_4_2" class="gfield gform_validation_container field_sublabel_below field_description_below gfield_visibility_visible" data-js-reload="field_4_2"><label class='gfield_label' for='input_4_2' >Email</label><div class='ginput_container'><input name='input_2' id='input_4_2' type='text' value='' autocomplete='new-password'/></div><div class='gfield_description' id='gfield_description_4_2'>This field is for validation purposes and should be left unchanged.</div></li></ul></div> <div class='gform_footer top_label'> <input type='submit' id='gform_submit_button_4' class='gform_button button' value='Submit' onclick='if(window["gf_submitting_4"]){return false;} if( !jQuery("#gform_4")[0].checkValidity || jQuery("#gform_4")[0].checkValidity()){window["gf_submitting_4"]=true;} ' onkeypress='if( event.keyCode == 13 ){ if(window["gf_submitting_4"]){return false;} if( !jQuery("#gform_4")[0].checkValidity || jQuery("#gform_4")[0].checkValidity()){window["gf_submitting_4"]=true;} jQuery("#gform_4").trigger("submit",[true]); }' /> <input type='hidden' name='gform_ajax' value='form_id=4&amp;title=&amp;description=1&amp;tabindex=0' /> <input type='hidden' class='gform_hidden' name='is_submit_4' value='1' /> <input type='hidden' class='gform_hidden' name='gform_submit' value='4' /> <input type='hidden' class='gform_hidden' name='gform_unique_id' value='' /> <input type='hidden' class='gform_hidden' name='state_4' value='WyJbXSIsImE0YjFiMmUxY2IxMWVhYTljM2FhNzdkODk4NDUzZmY0Il0=' /> <input type='hidden' class='gform_hidden' name='gform_target_page_number_4' id='gform_target_page_number_4' value='0' /> <input type='hidden' class='gform_hidden' name='gform_source_page_number_4' id='gform_source_page_number_4' value='1' /> <input type='hidden' name='gform_field_values' value='' /> </div> </form> </div> <iframe style='display:none;width:0px;height:0px;' src='about:blank' name='gform_ajax_frame_4' id='gform_ajax_frame_4' title='This iframe contains the logic required to handle Ajax powered Gravity Forms.'></iframe> <script> gform.initializeOnLoaded( function() {gformInitSpinner( 4, 'https://blog.yelp.com/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery('#gform_ajax_frame_4').on('load',function(){var contents = jQuery(this).contents().find('*').html();var is_postback = contents.indexOf('GF_AJAX_POSTBACK') >= 0;if(!is_postback){return;}var form_content = jQuery(this).contents().find('#gform_wrapper_4');var is_confirmation = jQuery(this).contents().find('#gform_confirmation_wrapper_4').length > 0;var is_redirect = contents.indexOf('gformRedirect(){') >= 0;var is_form = form_content.length > 0 && ! is_redirect && ! is_confirmation;var mt = parseInt(jQuery('html').css('margin-top'), 10) + parseInt(jQuery('body').css('margin-top'), 10) + 100;if(is_form){jQuery('#gform_wrapper_4').html(form_content.html());if(form_content.hasClass('gform_validation_error')){jQuery('#gform_wrapper_4').addClass('gform_validation_error');} else {jQuery('#gform_wrapper_4').removeClass('gform_validation_error');}setTimeout( function() { /* delay the scroll by 50 milliseconds to fix a bug in chrome */ jQuery(document).scrollTop(jQuery('#gform_wrapper_4').offset().top - mt); }, 50 );if(window['gformInitDatepicker']) {gformInitDatepicker();}if(window['gformInitPriceFields']) {gformInitPriceFields();}var current_page = jQuery('#gform_source_page_number_4').val();gformInitSpinner( 4, 'https://blog.yelp.com/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery(document).trigger('gform_page_loaded', [4, current_page]);window['gf_submitting_4'] = false;}else if(!is_redirect){var confirmation_content = jQuery(this).contents().find('.GF_AJAX_POSTBACK').html();if(!confirmation_content){confirmation_content = contents;}setTimeout(function(){jQuery('#gform_wrapper_4').replaceWith(confirmation_content);jQuery(document).scrollTop(jQuery('#gf_4').offset().top - mt);jQuery(document).trigger('gform_confirmation_loaded', [4]);window['gf_submitting_4'] = false;wp.a11y.speak(jQuery('#gform_confirmation_message_4').text());}, 50);}else{jQuery('#gform_4').append(contents);if(window['gformRedirect']) {gformRedirect();}}jQuery(document).trigger('gform_post_render', [4, current_page]);} );} ); </script>
[gravityform id="4" title="false" ajax="true"]
<div class='gf_browser_unknown gform_wrapper gform_legacy_markup_wrapper' id='gform_wrapper_4' ><div id='gf_4' class='gform_anchor' tabindex='-1'></div> <div class='gform_heading'> <span class='gform_description'></span> </div><form data-form-name='Newsletter Signup Popup' method='post' enctype='multipart/form-data' target='gform_ajax_frame_4' id='gform_4' action='/news/data-science-challenge-predict-restaurant-health-scores-with-yelp-data/#gf_4' novalidate> <div class='gform_body gform-body'><ul id='gform_fields_4' class='gform_fields top_label form_sublabel_below description_below'><li id="field_4_1" class="gfield field_sublabel_below field_description_below hidden_label gfield_visibility_visible gf-email" data-field-class="gf-email" data-js-reload="field_4_1"><label class='gfield_label screen-reader-text' for='input_4_1' >youremail@address.com</label><div class='ginput_container ginput_container_email'> <input name='input_1' id='input_4_1' type='email' value='' class='large' placeholder='youremail@address.com' aria-invalid="false" /> </div></li><li id="field_4_2" class="gfield gform_validation_container field_sublabel_below field_description_below gfield_visibility_visible" data-js-reload="field_4_2"><label class='gfield_label' for='input_4_2' >Phone</label><div class='ginput_container'><input name='input_2' id='input_4_2' type='text' value='' autocomplete='new-password'/></div><div class='gfield_description' id='gfield_description_4_2'>This field is for validation purposes and should be left unchanged.</div></li></ul></div> <div class='gform_footer top_label'> <input type='submit' id='gform_submit_button_4' class='gform_button button' value='Submit' onclick='if(window["gf_submitting_4"]){return false;} if( !jQuery("#gform_4")[0].checkValidity || jQuery("#gform_4")[0].checkValidity()){window["gf_submitting_4"]=true;} ' onkeypress='if( event.keyCode == 13 ){ if(window["gf_submitting_4"]){return false;} if( !jQuery("#gform_4")[0].checkValidity || jQuery("#gform_4")[0].checkValidity()){window["gf_submitting_4"]=true;} jQuery("#gform_4").trigger("submit",[true]); }' /> <input type='hidden' name='gform_ajax' value='form_id=4&amp;title=&amp;description=1&amp;tabindex=0' /> <input type='hidden' class='gform_hidden' name='is_submit_4' value='1' /> <input type='hidden' class='gform_hidden' name='gform_submit' value='4' /> <input type='hidden' class='gform_hidden' name='gform_unique_id' value='' /> <input type='hidden' class='gform_hidden' name='state_4' value='WyJbXSIsImE0YjFiMmUxY2IxMWVhYTljM2FhNzdkODk4NDUzZmY0Il0=' /> <input type='hidden' class='gform_hidden' name='gform_target_page_number_4' id='gform_target_page_number_4' value='0' /> <input type='hidden' class='gform_hidden' name='gform_source_page_number_4' id='gform_source_page_number_4' value='1' /> <input type='hidden' name='gform_field_values' value='' /> </div> </form> </div> <iframe style='display:none;width:0px;height:0px;' src='about:blank' name='gform_ajax_frame_4' id='gform_ajax_frame_4' title='This iframe contains the logic required to handle Ajax powered Gravity Forms.'></iframe> <script> gform.initializeOnLoaded( function() {gformInitSpinner( 4, 'https://blog.yelp.com/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery('#gform_ajax_frame_4').on('load',function(){var contents = jQuery(this).contents().find('*').html();var is_postback = contents.indexOf('GF_AJAX_POSTBACK') >= 0;if(!is_postback){return;}var form_content = jQuery(this).contents().find('#gform_wrapper_4');var is_confirmation = jQuery(this).contents().find('#gform_confirmation_wrapper_4').length > 0;var is_redirect = contents.indexOf('gformRedirect(){') >= 0;var is_form = form_content.length > 0 && ! is_redirect && ! is_confirmation;var mt = parseInt(jQuery('html').css('margin-top'), 10) + parseInt(jQuery('body').css('margin-top'), 10) + 100;if(is_form){jQuery('#gform_wrapper_4').html(form_content.html());if(form_content.hasClass('gform_validation_error')){jQuery('#gform_wrapper_4').addClass('gform_validation_error');} else {jQuery('#gform_wrapper_4').removeClass('gform_validation_error');}setTimeout( function() { /* delay the scroll by 50 milliseconds to fix a bug in chrome */ jQuery(document).scrollTop(jQuery('#gform_wrapper_4').offset().top - mt); }, 50 );if(window['gformInitDatepicker']) {gformInitDatepicker();}if(window['gformInitPriceFields']) {gformInitPriceFields();}var current_page = jQuery('#gform_source_page_number_4').val();gformInitSpinner( 4, 'https://blog.yelp.com/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery(document).trigger('gform_page_loaded', [4, current_page]);window['gf_submitting_4'] = false;}else if(!is_redirect){var confirmation_content = jQuery(this).contents().find('.GF_AJAX_POSTBACK').html();if(!confirmation_content){confirmation_content = contents;}setTimeout(function(){jQuery('#gform_wrapper_4').replaceWith(confirmation_content);jQuery(document).scrollTop(jQuery('#gf_4').offset().top - mt);jQuery(document).trigger('gform_confirmation_loaded', [4]);window['gf_submitting_4'] = false;wp.a11y.speak(jQuery('#gform_confirmation_message_4').text());}, 50);}else{jQuery('#gform_4').append(contents);if(window['gformRedirect']) {gformRedirect();}}jQuery(document).trigger('gform_post_render', [4, current_page]);} );} ); </script>