
Data fuels decision-making. Banks are well-equipped with the financial data of their customers. Experts often point out that consolidating internal financial data with other data sources (e.g. behavioral data, macro-economic data, etc.) will unfold data’s full potential. Yet, banks’ rich internal data is regularly overlooked as an opportunity that can be used to fuel decision-making. Banks need a solid data-gathering strategy and advanced data analytic skills to leverage their internal data.
How should banks approach internal data?
Data needs to be gathered with a clear purpose. Hence, the journey towards a data-fueled operating model starts with defining clear use cases. Subsequently, the use cases have to be checked against reality. Therefore, banks’ internal data should first be inventoried and categorized. It is crucial to define a timeframe for which data collection is performed (depending on the use case, data collection for the last three to ten years could be most suitable). Subsequently, the data can be put to work through e.g. model-building. While harvesting data with the goal to implement use cases is crucial, the strategy should also entail how to manage data in the future. Harvesting data from legacy architectures demonstrates the potential of data in general but is very inefficient for future endeavors. Here, breaking down data silos and building data lakes represents a robust solution. Currently, banks are still struggling with small projects that only reach the proof of concept stage and large projects that are abandoned due to overwhelming complexity. Incremental progress on mid-complex level projects represents the largest potential to strive.
Too much of a good thing: why data frugality is important
Occam’s razor is the idea that in problem-solving, the simplest solution is usually the right one. This approach is well-adapted in data science for several reasons. Firstly, a model’s appetite for data increases the risk of having unobserved data points which negatively affect the predictive power of a model. Secondly, more data increases the training time for models. More training time means more energy and consequently higher costs. Thirdly, more data can lead to impaired explicability of a model as a complex model’s results are harder to interpret. This is especially the case if deep learning methods are applied (which remain to a large extent black boxes). The low explicability of models prevents their application as part of automated decision-making due to GDPR regulations. Moreover, low explicability could make the model unstable in times of new hitherto unseen data. Users will have difficulties to explain why and with what accuracy the model is adapting to the new circumstances. In general, striving for parsimony is an important criterion for which banks have to optimize when using their data.
Keeping data in the loop
Oftentimes, it is argued that data evolves from simple data to information to knowledge. While that is true for many use cases, it should be pointed out that data-fueled decision-making does not always require intense computation to become knowledge. Depending on the level of human-in-the-loop or the affordances of a decision, very simple data points can be highly informative. However, if data is processed in a time-consuming and complicated manner to derive knowledge (e.g. in form of a report), this knowledge should be kept in the loop. Hence, the results of data processing should become part of the data storage.
– Jonas Röttger, ESR