For the last five years, Business & Decision’s Data Science projects have experienced strong growth in a wide variety of areas, such as the oil industry, telephony, retail and services. However, some difficulties must be overcome to effectively implement such projects. Explanations.
First, remember that Data Science is based on several disciplines whose mastery is essential to ensure the smooth running and success of a project:
• The preparation of the data, including the challenge of gathering all the data in the same place, recoding it and preparing it to put it in shape and making it workable,
• Statistics, whose understanding of principles is fundamental to accurately manipulate data
• Machine Learning, the essential tool for managing massive, evolving, flow or incomplete data,
• The AI that allows intensive learning and automation.
Didier Gaultier , Director of Data Science & IA (Business & Decision), identifies four main challenges that Data Science projects often face, with concrete steps to be taken to overcome them.
- The stake of the data “in silos”
It is very common today that business data is “silted”: each business has its own information system (IS). The data constituting the basis of the project, it is crucial for companies to enroll in a Data Centric approach by:
• Placing the data in the center of the SI: construction of datalake / datahub;
• Having a dedicated team;
• Putting in place data governance.
- Prerequisites and organization of the project
Before being able to frame the project and launch a possible pilot, two prerequisites are essential.
Understand business issues
The proper understanding of the job and its problems must be acquired. This conditions the success of the approach and its adoption by the internal teams. Any Data Science project must therefore be initiated with the business teams through workshops.
Diagnosis of data and IS architecture
In order to identify the opportunities and the constraints related to the data, it is preferable to organize “data” workshops with the internal teams and the DSI. These will in particular make it possible to anticipate possible constraints during the industrialization phase: choice of architecture, tools or programming language.
- Managing the complexity of algorithms
A good management of the complexity of the algorithms is necessary in order to master the bias / variance tradeoff governed by the learning data. In some industries, constraints apply. For example, in the bank, algorithms are constrained by a requirement of traceability.
- The difficulties of industrialization of the models
The industrialization phase allows the passage and the putting into production of modeling. However, it can be difficult, especially in the following cases:
• The data was not “de-silenced”
• The chosen programming language does not lend itself to industrialization (prefer Python to R for example)
• The maintenance tools are not suitable while there are specialized tools ( Dataiku , Knime , Azure Machine Learning , SAS )
4 examples of Data Science projects
At Business & Decision , experts rely on three pillars of Data Science: “explain, predict and prescribe” to support clients in the valuation of their data. Today, Data Science can grow in all areas. Among the projects carried out by the company:
• The oil industry: development of a platform for predictive analysis of consumption, level of extraction and crude oil refining capacity for a player in the oil sector
• Telephony: improving the level of customer service for a telecommunications company by intelligently managing support tickets with a “bot”
• Retail: setting up an “anti-churn” device (or retention) for customers of a distributor of French electrical products
• Services: improving the efficiency of La Poste Group’s mail distribution , thanks to a dynamic routing algorithm for factors, from the prediction of deliveries to the address. This project has led to the creation of new services: “Shipping in mailbox” and “Watching over my parents”
This article was written by Mathieu Bruniquel, student of Master in Big Data Telecom ParisTech , promotional 2019. It follows the intervention of Didier Gaultier for students of MS Big Data Telecom ParisTech, came to share his vision of the profession Data Scientist / Engineer and his field experience.