WRITTEN BY KLAUDIA KLOCZEWIAK

Big Data and Data Science are both terms that have become buzzwords in today’s world. Since it was recognised that using customer information to target the marketing of products (Tesco’s Clubcard being an early example), people have understood that access to insightful analysis of data will give advantages to an organisation trying to get ahead of its competitors. The proliferation of various techniques for doing so, both in the private and public sectors, demonstrates the value and usefulness in controlling large amounts of data. However, few without direct experience understand the complexities and nuances involved in tackling these large data sets.

Once the basics have been covered, the problem and the preferred approach to finding a solution has to be defined, the chosen method still has to be implemented. For example, if a bank is looking at preventing fraud and has decided on an approach (such as looking for patterns of trades and then to block suspicious events), it needs to employ people with the right skillset and analytical tools to find such patterns. The first type of employee that would occur to most people is that of a Data Scientist, as the word has been popularised in social media and the press. However, this does not mean that this is the best skill set for the job  – to assess the right person to fill that role it is worth elucidating the differences between data analysts and data scientists.

What is expected of these roles?

Both data analysts and scientists’ work involves deriving insights from the data. They need to be at ease with presenting their techniques and results to non-technical stakeholders. This can be challenging to technical professionals as it requires stepping back from focusing on details to look at a bigger picture. This also means they both need to have a good feel for business and product knowledge. Some keen data analysts and scientists can get lost in their work and forget to focus on their stakeholders’ business requirements, which can be an issue on a time sensitive project.

Data Analysts

Data Analysts usually come from all different backgrounds and are brought into businesses to derive and present insights. They typically have a lot of experience through their assignments in scoping and defining business requirements. Data Analysts should be able to slice and dice information and visualise results with the use of graphs and tables, which would then be used to make decisions based on the high level insights derived. Data analysts would usually be hands on with widely used Microsoft software tools such as Excel, Access and SQL Management Studio. They should also be familiar with Business Intelligence (BI) packages (such as Tableu, QlikView) and some of them will also have statistical packages (such as SPSS or SAS).

Data Analysts would normally have little to no experience in programming and be limited to what is offered by the software they use. This skillset means that, although they may help a business decide how to tackle fraud, they would be less useful in building the solution. Working in a fraud department, they might use techniques including manual spot checks, Excel and SQL queries. These processes may not be the best approach to prevent fraud as they are slow and not scalable to large quantities of data. Furthermore, data analysts do not have many tools for spotting new innovative indicators which could provide greater insight into the data set.

Data Scientists

Data Scientists provide insights and solutions using programming and statistical knowledge. They should be familiar with standard data manipulation tools (e.g. Excel, SQL), but more importantly be able to utilise coding languages (such as R, Python or Hadoop) – those skill sets an analyst wouldn’t have. Data scientists typically come from post-graduate degrees in quantitative subjects, meaning they understand how to achieve high levels of accuracy through automated solutions. They bring the curiosity to the team to look beyond the surface honing statistical and machine learning techniques. The only reservation is that they might be detached from the business contact given their focus in academia.

Working in a fraud department, these resources would focus on data clean-up followed by implementation of tools for spotting fraud patterns. Data scientists would boost the analysis by application of techniques allowing users to select which patterns to look for, which assumptions to make in any analysis, and the ability to alter the level of certainty that constitutes a ‘match’ for any potential results. For example, a data analyst might have found that one of the fraud predictors is when a counterparty posts a number of trades with the same notional on a day. This could indicate that the client is splitting up one large transaction into smaller transactions to circumvent detection. Data scientists could implement a model using this information, whilst achieving a higher level of accuracy. Furthermore, they would be able to work on techniques which could spot new predictors and patterns in the data, leading to further ways of detecting fraud. These techniques would provide a Compliance department with a much more comprehensive understanding of the issue. Lastly, by applying Big Data technologies, data scientists could reduce the time taken processing the data from days to minutes.

Conclusion

Data scientists provide a different skillset to data analysts, despite overlaps which mostly relate to business and project management. They share some of their technical tools, but mostly use them for different purposes. In the example of fraud detection, data scientists would bring better value to a team by bringing the analysis as close as possible to the true nature of such a complex problem. However, with less complex issues and lower data volumes, they could possibly over-complicate the process and implement technologies that are not really necessary. Both data analysts and scientists fit a different purpose and should be resourced differently according to the needs of each organisation.