Get startedGet started for free

Job roles, tools and technologies

1. Job roles, tools and technologies

Fantastic. Now, let's step back a little bit and review the data needs pyramid. We'll review different roles involved in each step, and then review each role's responsibilities, tools, and team structures. A business leader might not want to make all the decisions, but having a solid basic understand of the data roles, tools and different ways of structuring a data organization is a necessary skillset to survive discussions with your data science leader.

2. Data pyramid and roles

Remember the data needs pyramid we reviewed in the first video? Let's see what kind of job roles are involved in each step.

3. Infrastructure owner

Data collection is led by infrastructure owners, who are mostly software and system engineers, who maintain and develop the systems like websites, machinery, applications, electronic and mechanical devices, service platforms and so on.

4. Data Engineer

Data storage is managed by data engineers, and sometimes software developers. Data engineers is a broad term, but there might be further specializations - for example database administrators and data pipeline engineers. These specialists focus on building data pipelines, storing the data in a reliable and accessible format, and enabling data access tools to other teams.

5. Data Analyst

Now, the data preparation and cleaning task is co-shared by data engineers and data analysts. Some tasks like data quality assurance will most likely be owned by data engineers, while others like preparing usable datasets, aggregating them for reporting and analysis purposes would be owned by data analysts.

6. Data Scientist

Next, data analysis is a task where both data analysts and data scientists are involved. Data analysts build dashboards, scorecards, own adhoc and deep dive analyses to understand the business and build self-service tools so the business teams can generate their own insights. Data scientists on the other end also analyze data but they apply additional methodologies - while they will analyze data trends and distributions, on top of that they will apply statistical and machine learning methods to discover patterns in data, statistically significant differences and signals not easily discovered through simple data aggregation and analysis techniques.

7. Machine Learning Engineer

Finally, machine learning engineers are involved in the last two needs, where they are working with data scientists to test different models and then deploy them in production systems like CRM or mobile applications. The line between data scientist and machine learning engineer is blurry, but my rule of thumb is that if the model has to be built from scratch and put in production, then it's a machine learning engineer's job. Contrary, if this is a new business question, and this will require experimentation and prototyping, then it's a data scientist's job.

8. Team structure

Briefly, I want to discuss how a data function can be organized in a company. There are three main operating models - centralized, decentralized, or hybrid.

9. Team structure comparison

In the centralized model, all data functions are placed in one central team. This works well for small companies, startups and new organizations in general, as this ensures consistency and focus. It doesn't scale well once the business starts growing and the company has more products, departments, and functions that increase complexity. A decentralized model means that each business unit has their own data function. This works well for larger organizations, but this introduces silos, lack of company wide data governance, overlap in efforts. The best approach is a hybrid one where advantages of both models are utilized. Here, the critical functions of data governance, methodology, and tooling are centralized, while the application of the tools and methodologies to prototype, analyze the business, build models, run AB tests is decentralized.

10. Let's practice!

Lots of information in this video, now let's go check some of the exercises to test your knowledge!