Our team was excited to be back again for this year’s RapidMiner user conference, Wisdom 2020. We saw some familiar faces, heard interesting analytics use cases from both the business and technology perspectives, and enjoyed the general buzz you find at an event filled with data scientists and analytics enthusiasts.
A few themes emerged throughout the conference, starting with RapidMiner co-founder Ingo Mierswa’s ever-engaging keynote at the start of the agenda. He spoke on the importance of building models that are resilient, meaning that sometimes the most accurate model may not always be the best model for analytics use case. The same is true in training and testing machine learning models: depending on how you create subsets of your available data to train, then test your model, you could end up with various accuracy measures on the test set. Some test sets may be particularly good for your model, leading you to believe you’ve built a winning model. That may still be true, but the goal for data scientists should be to build resilient models and demonstrate this by using proper testing methodologies and then monitoring models once deployed to track the model accuracy over time. After all, the value from machine learning models lies in how well that model performs over time and how well the business can take action from its results.
Another theme from the conference was the importance of humans in the data science process. While Auto ML makes it easier to rapidly prototype and quickly build predictive models, our team doesn’t see the role of humans in analytics going away any time soon. Even organizations with fully integrated, real-time data still have data gaps where people hold the knowledge and it isn’t stored digitally. Certain applications are best when capitalizing on the data through machine learning, and then guiding the business with actionable insights driven via an approval mechanism. Subject matter expertise is integral to create and prioritize analytics use cases that will drive value in the business and map out the steps that will be taken with the results to automate or drive change. Working with the business to understand the data going into the model is critical to avoid common pitfalls like leaking information into the training data that the model will not have when it runs in real life.
The RapidMiner team announced their new product RapidMiner Go, which enables business users to use their AutoModel tool to build predictive models through a web link. This means that business users can begin taking part in data science without having to download anything or learn a coding language or platform, and they can bring their knowledge about the problem directly to the solution. They demonstrated the lifecycle of a data science project on stage by showing their data scientist collaborate with both a coder and business analyst. The coder pulled in some python code using the new integration with Jupyter notebooks to add columns for city and state using a zip code. Then, the business analyst reviewed the dataset in RapidMiner Go and attributed costs to false positives and negatives to estimate and optimize over the financial impact of the model. Then, he built a model to share back with the data scientist. The data scientist was then able to customize the model and deploy it to production in a few clicks. This example supports our team’s beliefs that analytics is a team sport and requires multiple players from the business, data availability, change management to foster adoption across the organization, and technical support for application development and model deployment and monitoring.
The analytics use cases at the RapidMiner Wisdom 2020 conference that seemed to garner the most interest were around targeted marketing, time series forecasting, and integration with visualization tools. Our very own Elise Watson presented on improving sales and marketing for the customer journey with a focus on health care providers. We see a lot of interest in the market for orchestrating various data sources to better reach customers through targeted marketing and sales efforts. There are interesting applications for understanding next best actions, using natural language processing to optimize the language used in communications, and identifying influencers in the customer base using graph databases.
There were several sessions about analytics use cases on using machine learning to do time series forecasting. The integration with visualization tools continues to be a method for growing analytics in organizations. Business intelligence and visualization are good stepping stones for analysts to become more data-driven and to do high impact analysis and combining dashboards with underlying predictive models is a great way to augment analytics.