Data are the facts from which information is derived. According to a white paper published by IDC titled The Digitalization of the World – From Edge to Core, it is predicted that the global datasphere will grow from 33 zettabytes to 175 zettabytes by 2025.

With so much available data, the need for data visualization to facilitate decision-making processes has been increasing, bringing forth the demand for Data Scientists to translate the ginormous piles of data into useful information.

The Nagaoka Review spoke to Matthew Sia about his thoughts on the use of data in Human Resources Analytics and Management, and what goes on in the process of translating data into useful information.

Digital Breadcrumbs

People in general love snacks, be it a healthy or an unhealthy option. Imagine a colleague getting their hands on that cookie or a handful of potato chips, munching away happily while walking from the pantry back to the workstation in the office. While it might sound like a normal thing to do, would one notice the trail being left behind on their journey back?

Breadcrumbs: something that we all take for granted. But under normal circumstances, besides being an annoyance to the cleaner, these breadcrumbs do not share intimate information about a person. However, have we ever wondered what if these breadcrumbs could talk? Who would be collecting the breadcrumbs? And what would the breadcrumbs be talking about?

Breadcrumbs in the office. Image via

Our activities in the modern-day organization, be it large or small, leave such breadcrumbs: digital breadcrumbs. More often than not, these breadcrumbs are tucked away in a spreadsheet or a database that not many people in the organization look at, and hence they are somewhat taken for granted or even forgotten; unless somebody needs to look up something about a particular someone. A lot of organizations out there still do not realize that the data in that tucked-away spreadsheet or database contains a lot of information which could potentially make us realize things that were not obvious which are critical to our human resources, and further prescribe what can be done in order to maintain our talent pool.

Science in HR?

Now would be a good time to set aside the thought that Physics, Biology and Chemistry are the only branches of science, and drop the assumption that Human Resources is anything but science. Any form of study on a subject through the process of data analysis, hypothesis testing, inferencing, and modelling is itself a scientific process. The beauty of scientific processes, coupled with advances in technology in the current times, is that it can be applied to the study of any topic of interest – and Human Resources is no exemption.

A typical Human Resources data set

In the current market climate in most parts of the world, competition amongst organizations is stiff in recruiting and retaining the best talent for their own for them to stay competitive, efficient, and productive. As such, it would be in these organizations’ best interest to be able to find out what drives the efficiency of the workforce, motivates their top talent, and prevents them from moving out. Some companies have even taken the drastic step of paying top dollars to totally outsource this piece of work to external consultants, in the hope that it will point them in the right direction.

“Most people do not realize that some the best answers to their burning human resources questions can be found within their organization, buried deep in their various human resources or enterprise resource planning databases.”

Most of these bits data are probably stored in many different files and tables, but by undertaking a simple exercise to pull all these together, and running them through some analytics processes, the resulting information output could cast a bright light on many things that organizations are not even aware of about themselves.

For the purposes of this article, the focus would be on factors that are probably causing their top talent to be underappreciated, their underperformers over-rewarded, followed by the unavoidable attrition of their high performers.

A useful correlation matrix generated in Python, a software popular among data scientists

A Standard Process, surprisingly…

Many data science processes could be employed to provide a good level of analysis into the human resources data that the organization possesses. These processes could be as simple as using the data to plot out bar charts or histograms, or as complex as running multi-variable logistical regressions to discover the correlations between various data variables and making inferences to the underlying situation in the company. However, as stated by Big Sky Associates, the underlying process remains the same, namely:

Step 1: Define your questions
Step 2: Set clear measurement priorities (what and how to measure)
Step 3: Collect Data
Step 4: Analyze Data
Step 5: Interpret Data

Of the five steps describe above, the most challenging one is the data collection process, where the difficulty depends on the number of tables or databases across which the data is stored, as well as the quality of the data entry. Many clarifications with various departments would need to be made, many enquiries with staff to be done, and the collected data would need to be tabulated and cleaned in order to enable analyses that are as unbiased as possible. If the exercise is conducted by a team of analysts that are sanctioned or sponsored by top management, it could potentially ease the process a little.


The analysis and interpretation of the data, once compiled and cleaned, would depend largely on what the management of the company would like to know. Taking an example of management wanting to know what causes attrition in their company, the analysis would therefore need to be centered around attrition data, as well as other data such as their monthly salaries, performance ratings, promotions, level of job satisfaction, manager’s performance rating, or even tenure in current role and age.

Python data charts and plots

In some instances, a simple bar chart or a histogram would tell all. However, when charts do not state the obvious, more advanced methods such as regression analyses would be helpful to establish statistically significant correlations between the various human resources data points and attrition.  This would require the knowledge of a person with an intermediate level of understanding of mathematical statistics and will show us the factors that are significantly correlated to attrition.

Data via IBM HR Analytics dataset from

In a nutshell

Regardless of the complexity of the analytics processes, the end result is most important: providing management with some answers on which aspect of running the organization should they be zooming into and making the necessary changes to prevent or minimize attrition.

High performer attrition is but only one of the many aspects that organizations can investigate and mitigate by means of the data analysis above. There are many other areas which could also be studied, such as gender equality and representation and even on how to control overtime. The underlying data analysis process remains the same; what changes is only the type of data collected and perhaps the form of analysis that is carried out – graphical or statistical. With the correct quantity and type of data coupled with the correct type of analytical procedures, the truth could be squeezed out of these digital human resources breadcrumbs.

About The Author

Trained as an Investment Banker and a Certified Professional Accountant (CPA Australia), Matthew Sia is a seasoned Accounting and Finance Leader with a passion for Mathematics and Statistics. He recently embarked on the Data Science, Analysis and Visualization journey, and is now an expert in leading business stakeholders to achieve goals by utilizing data. He is also dedicated to becoming a leader by example, believing in developing future leaders by empowering them to make critical business decisions independently and maturely.

In his free time, he enjoys playing musical instruments and analyzing his cat’s behavioral pattern. He also enjoys verifying the statistical properties of various quantitative researches.

He could be reached via his LinkedIn page.

Main image by Pdusit on Adobe Stock Photos via

Michelle Lim

Michelle Lim

Creative Consultant, JCE Japan Creative Enterprise