You have been tasked with booking your team’s Christmas party. You are given a budget of £1,500 and there are 25 people in the team. You visit an event marketplace that has thousands of venues, confident you can find the perfect Space for your event. After a quick search, you are faced with an unusual problem.
£1,000 minimum spend, £50 per delegate or £1,500 per evening, which is going to be within my budget?
This is a question faced by users of every venue marketplace. Venues will have a ‘display price’ on their listing but the total cost of booking an event at that venue can be anywhere from 50% to 1000% of that price. A realistic cost comparison simply isn’t possible without speaking to each venue and understanding if that minimum spend includes drinks or if dinner is included in that delegate rate. The result of all this? Guests have to invest more time to get visibility on their event spend and venues are inundated with event requests that they are never going to realistically service.
At HeadBox we are confident that we can improve this situation. When a Guest asks us what they can get with their budget, we want to give them a straight answer and we want venues to be excited to service the enquiries they receive. To achieve this, we are using data science and machine learning to develop a solution to the problem of total event cost that brings clarity and efficiency to both our Guests and Hosts.
Framing the problem
Given HeadBox’s 4 years of experience booking and facilitating events of all scales, we have a data set of almost 15,000 instances of a ‘total event cost’ being assigned to a venue on HeadBox. This data has been generated by our in-house account management team creating venue proposals, their industry expertise makes each datapoint a realistic representation of ‘total event cost’ at that venue.
Total Event Cost — The sum cost of hosting an event at a venue, including venue hire, food and beverages and other additional extras which are often not included in display price.
In addition to this, HeadBox operates a network of over 7,000 listed venues. Each venue’s listing is rich with data, from location to capacity to ideal event types.
With the combination of HeadBox’s venue listings and assignment of total value to these venues, we have a dataset which looks suitable for training a machine learning model to predict the ‘total event cost’ at a venue from the information in a venue listing.
What does the data tell us?
A good starting point is to look at the relationship between the display price and total event cost. For this purpose, we will break display price down into display value and display category.
Plotting display value against total event cost shows us there is no obvious relationship between the two. What’s more, we see vertical lines in our graph, demonstrating how a single display value can correspond to a large range of total event values. We are definitely going to need more features to make reliable predictions here.
Location also shows us that assigning total event cost is a complex problem. Instead of inner-city venues simply being higher value, visualisation of total event cost by location demonstrates a more regional distribution. In a city such as London, this is likely to be due to low-value meeting rooms being prevalent in some areas (shown below by the tall, blue columns), whilst private dining and high-value bookings dominate other areas.
To train an accurate machine learning model, we will combine the above data points with features such as type of event hosted, capacity and Space description.
Selecting and Building a Model
The role of a machine learning model in this context is to take labeled training data and learn to assign a label to unlabelled data in future. In this case, our training data is the venue information found in the listing and the label is the total event cost. Our model will take this training data and learn to assign a total event cost to all venues in the HeadBox marketplace.
Our dataset is a combination of numerical and categorical data, which can be confuse a machine learning model. To solve this, we can convert each category into a column in our data, which will be either 1, if that venue is in that category, or a 0 if it is not.
After this, we were satisfied that we had a dataset that was ready to begin training a machine learning model. Sampling a number of different models and evaluating their performance on test data, we concluded a decision tree model was best suited to handling our complex data set. A decision tree is essentially a series of ‘nodes’ which asks a question of the data, the answer to which (true or false) decides how we proceed through the tree. The final layer of nodes will then have a ‘value’ associated with it, which is the model’s prediction of total event cost.
Using this method, we are able to generate a model that predicts total event cost with a good degree of accuracy. The performance of the model on a reserved test set of data is shown below.
We can also see that there is some error in our predictions, particularly at higher values. We believe this is a good reflection of the nature of the events industry, with venues being capable of servicing a wide range of event values. This complexity also underlines the demand for the inclusion of more complex features in our models, such as Space descriptions.
Including Space Descriptions
One extremely rich data source which we have ignored thus far is the Space descriptions on each listing. Using our models we can now start analysing the correlations between these descriptions and total event cost. This is important to us as a Space description is free text which is written by the Host themselves, designed to capture the essence of that venue’s offering. Understanding keywords and phrases that denote venues of a certain group, will enable us to return better recommendations to Guests and give truly unique feedback to venues on their listings.
We can first visualise the correlation between terms and total event cost, in a manner that is agnostic of the context of that word’s use. This gives us a bank of keywords, which can be used to associate a venue with a value grouping. Some good examples from the below plot include ‘auditorium’, ‘national’ and ‘ballroom’ being heavily associated with venues of high value and ‘boardroom’, ‘training’ and ‘city centre’ suggesting a lower value venue.
On top of this, we can now bring in the context with which a word is used, visualising clusters of words which are used in a similar manner in sentences. When this is combined the total event cost of that venue (red — low value to blue — high value), we have a data set which shows how venues targeting different value groups will select words differently. Some of our favourites are groupings which highlight adjective use and descriptions of materials within the venue, with ‘marble’ and ‘brickwork’ suggesting higher value, as well as ‘ping’ and ‘pong’.
For us, the key takeaway from this project is that we are able to better assign a ‘total event cost’ to an event Space. This means a few things:
- Guests coming to HeadBox.com are able to submit their event budget as part of their search for the perfect event Space. We will then be able to give them straight, real-time answers as to what they can realistically afford, saving them a great deal of time and disappointment down the road.
- Hosts will receive fewer requests to Host events which just aren’t the right fit for them. This will save them time in responding to these enquiries but also ensure that they know that an enquiry from HeadBox is one that is well qualified and likely to result in an event.
- HeadBox’s in house account management team can use the assigned ‘total event cost’ to save time in contacting venues for our large corporate clients, resulting in us serving their needs faster and more efficiently.
- Finally, we have a proven model that machine learning and data science can revolutionise the event industry. One which we can take forward into our future challenges.