Managing a Data Platform Team in a Data Mesh: Key Insights

Introduction

In February 2022, BlaBlaCar's Data team transitioned to a Data Mesh structure. We now comprise seven distinct teams:

Six interdisciplinary, domain-focused squads, each responsible for its respective business area.
One central platform team, which consists solely of Data Engineers.

The establishment of the data platform team aimed to deliver unified infrastructure and tools for data professionals across all squads, including Data Engineers, Analytics Engineers, Data Analysts, ML Engineers, and Data Scientists. We also initiated cross-functional chapters to bring together individuals with similar technical skills from various teams, facilitating knowledge sharing and collaborative leadership in their areas of expertise.

This article outlines the lessons we've gathered over the past couple of years while implementing this model, focusing primarily on the challenges that arise from such a team structure, rather than general management topics.

Generic Products

When collaborating with a specific team on a use case within the mesh, we maintain a focus on reusability. In other words, we assess whether our developments can be reused effortlessly across different scenarios and teams. We envision the platform team as a startup within the organization, dedicated to providing shared tools for data practitioners. Consequently, the platform team develops agnostic software, while domain teams are responsible for its application.

Naturally, creating generic software entails a higher initial investment. For instance, developing a script capable of duplicating any backend table to your data warehouse is inherently more complex than one designed to handle a single, specific table. The generic version must accommodate any potential field type in the input schema, translating it to an output warehouse schema. In contrast, a specific approach allows for hard-coded schemas, simplifying the process by eliminating the need for intricate schema translation logic.

From our experience, investing in genericity proves advantageous for significant products, as these custom-built tools can be effortlessly reused.

Concrete Case

We needed to create an in-house feature store— a data repository for machine learning purposes. This feature store supplies data to data scientists for generating training datasets and also serves live models in production.

What Didn't Work

In our initial attempt, we developed a feature store tailored to specific data models (such as trip offers, bookings, and users), resulting in heterogeneous input pipelines and specialized code for data storage and retrieval.

Maintainers of the feature store had to grasp the business logic behind each use case, complicating maintenance. Incorporating new data often required adjustments to the app's entire code architecture. Users had to submit Jira tickets whenever they wanted to add new data, necessitating significant changes each time.

What Did Work

While the first iteration enabled production-level machine learning, it was difficult to scale due to the maintenance complexity. We transitioned to a model where the platform team constructed a generic feature store devoid of knowledge about the specific data. This new approach allows for the addition of data pipelines without requiring extensive domain knowledge. Maintainers provide the software solution, while users configure and implement it themselves, thus enhancing autonomy and minimizing the need for maintainers to understand every domain use case.

Cope with Too Many Stakeholders

On one side, a data mesh grants autonomy to domain-oriented squads, fostering strong ownership of their areas with minimal coupling. Conversely, the data platform team maintains connections with all teams. Stakeholders encompass nodes within the mesh, technical chapters, and other engineering teams, who often rely on the platform team as a liaison to the Data department, especially regarding low-level technical matters like security and compliance.

We found it easier for stakeholders to approach us for a holistic overview of the data stack, to refine their requests to data practitioners, and to enable solutions that can be built once for all rather than by individual teams.

However, managing numerous stakeholders presents challenges. We must gather information, identify impactful opportunities, drive adoption among various stakeholders, and tailor communication to suit diverse audiences.

Concrete Case

In our scenario, we have six Data teams, four chapters, and five Infrastructure teams as stakeholders.

We invest significant effort into strengthening our connections with them through regular checkpoints, feedback requests, demos, and collaborative roadmap development led by the data platform product owner. He gathers requests and feedback and works on identifying key impact areas.

We also established a "stack governance" meeting to openly discuss major decisions regarding our stack. Chapters play a crucial role in guiding design choices, with platform team representatives participating actively.

Build the Right Team

Our scope spans a wide range of activities, from supporting data science initiatives to analytics and infrastructure.

To effectively engage with our various stakeholders and fulfill our diverse missions, we needed a team with extensive expertise. All members of our development team are Data Engineers who share equal responsibilities, yet each has a unique background, including experience as Data Scientists, Analytics Engineers, Marketing professionals, Infrastructure specialists, and Backend or Frontend Engineers. This configuration enables us to cover a broad technical spectrum while facilitating internal knowledge sharing.

Concrete Case

Initially, team members were only comfortable with specific areas of the stack, making it challenging to address issues or fulfill requests when the "expert" was unavailable. To address this, we adopted a Skills Matrix, enabling each member to self-evaluate. We dedicate weekly sessions where an expert guides the team through common operations for a specific stack component, allowing "non-experts" to perform the tasks and ask questions. We subsequently reassess their skills.

We sometimes assign topics to those least comfortable with them intentionally. While this may seem counterproductive, we view it as a long-term investment in developing a stronger team. Experts support these individuals throughout the process to ensure we deliver quality increments at the right pace.

Pick Your Battles

With numerous stakeholders and an expansive scope, we found ourselves in a potentially overwhelming situation.

To avoid becoming inundated with requests, we employ two key strategies:

Selectivity in roadmap development: Each quarter, we focus on a few priority projects, setting aside others.
Rotational responsibilities: Each week, a team member is designated to address issues within the stack and provide user support, ensuring continuity of service.

Without a clear strategy for selecting our battles, the workload can become unmanageable, especially as adoption increases demand for support.

Concrete Case

A data squad focused on buses sought to ingest data from an external API. Lacking a suitable external solution within budget, we opted for an internal build. We faced two choices:

Develop a generic solution from the outset with the data platform team.
Allow the domain squad to operate independently without our intervention.

We collectively decided to have the domain squad build and maintain the solution locally, freeing up the platform team to concentrate on other priorities.

Align with Key Stakeholders

From a different perspective, initiating numerous projects simultaneously within the platform team can create pressure and instability for others. Instead of focusing on their domain-specific topics, mesh teams may find themselves preoccupied with the changes we're implementing in the infrastructure.

Even with good intentions, rolling out too many changes simultaneously can negatively impact the organization, leading to resistance. The key reason for this resistance is that the intended audience may be unavailable or unclear about the rationale behind the changes. Therefore, we plan ahead to ensure they can absorb the changes and that they are involved in the projects.

Concrete Case

One of our initial projects involved rewriting the tracking pipeline for frontend applications, which proved complex and difficult to maintain. Initially, we treated this as a platform team project. The new version created tables that analytics engineers across all domain squads needed to adopt.

However, analytics engineers had their own priorities and lacked the context for the changes. They were uncertain about the accuracy of the new data and did not have time to engage with it.

This situation proved challenging, and it took too long to complete the project, complicated further by communication and negotiation difficulties regarding time allocation.

What Did Work

A large-scale change currently underway is our transition to dbt. This initiative originated from analytics engineers, who sought to modify their processes, and the platform team is supporting this transition by providing necessary infrastructure.

The dynamics of this initiative are markedly different. Stakeholders are not merely complying with the change; they are actively leading it.

Flexibility over Hard Rules

Our platform team aims to prevent redundancy by offering shared tools and patterns. However, there are instances where domain-oriented squads need to act swiftly and independently to seize business opportunities, meet deadlines, or experiment on their own. We fully support this autonomy.

We position the data platform as a facilitator rather than a directive team. Our goal is to encourage adoption rather than impose mandates.

The Data Mesh inherently favors flexibility. A transversal team should enhance this flexibility rather than hinder it. Rules and patterns emerge organically through the tools we provide. This is particularly true during initial phases; often, we refrain from making things generic until we observe a consistent pattern, at which point we invest in making it more universal.

To foster cohesion in our stack, we also implemented a system for swapping engineers between the platform team and domain squads. This internal mobility promotes harmonization of practices. Those joining the platform team bring domain knowledge, while departing members serve as ambassadors for the common platform.

Concrete Case

We are currently reorganizing our data warehouse. The platform team is taking a supportive role, allowing analytics engineers and data analysts to determine how they wish to structure their GCP projects. This initiative is primarily led through their chapter.

The platform team facilitates project provisioning and establishes an appropriate rights management system to align with the preferences of analytics teams. We only intervene in cases of significant complexity or security risks. Squad members should feel supported and have the flexibility to perform their roles without being constrained by infrastructure requirements or strict deadlines.

Innovate Around Real Use Cases, Not Trends

We do not adopt tools merely because they are trendy; we embrace them only when there is a tangible business need.

To facilitate this, we typically collaborate closely with a data squad on a genuine use case. We build a solution and then scale it to other teams, a strategy we have found highly effective. This approach allows us to demonstrate the measurable benefits of a new tool or methodology, making adoption smoother.

Collaborative development fosters common objectives and shared success, proving to be the best way to work together.

Concrete Case

In Q4 2022, we sought to enhance our machine learning processes by adopting an MLOps culture. This transition could not have been achieved solely by the platform team.

We recognized that a domain squad responsible for fraud detection aimed to improve the lifecycle of its machine learning models, so we joined forces.

This collaboration resulted in a working group that discussed organizational changes and experimented with tools and methodologies. The project led to significant improvements in the quality of machine learning in production, reduced time to market, and decreased the cost of implementing a training pipeline by 75%. We then replicated this model for other use cases with different teams.

Don't Take on All the Roles

At one point, we faced challenges due to the volume and nature of requests we received. Many individuals instinctively contacted the data platform team when issues arose. While flattering, this perception complicated the workload for our team, especially in the early stages of the data mesh reorganization. Data practitioners sought our help with questions beyond our scope, including schema-related inquiries, access to tools managed by the IT team, and issues with infrastructure owned by the SRE teams.

Our initial instinct was to assist as best we could. However, the more helpful we were, the more requests outside our expertise we received. The solution was to refrain from assuming responsibilities that were not ours and redirect inquiries to the appropriate teams. This clarification improved dynamics within each domain squad and strengthened connections between the mesh nodes and their respective backend teams.

Concrete Case

To address the issue of out-of-scope requests, we engaged in several discussions, including:

Team discussions to identify the issue and collectively decide to cease assuming those roles.
Conversations with engineering managers from domain squads to encourage data experts to consult their multi-disciplinary teams first.
Discussions within the chapter where Data Engineers gather to raise awareness of the issue and enlist their support in providing initial assistance within their groups instead of turning to us.

As a result, the volume of out-of-context requests significantly decreased as the new organization became clearer, fostering internal dynamics and clarifying the responsibilities of the data platform team. Nonetheless, this is an ongoing effort, particularly for newcomers.

Build vs. Buy to Reduce Your Footprint

As previously mentioned, the data platform team develops and maintains various in-house products, but we also leverage open-source solutions and third-party products.

Our guiding principle is to build internally only if:

There is a genuine need.
No viable external alternatives exist, or the cost difference is unreasonable.

We consistently evaluate build versus buy options for each architectural decision and periodically revisit our choices to challenge our stack.

This approach is crucial to maintaining only what needs to be managed in-house. Otherwise, the data platform would become overwhelmed with maintenance tasks, diverting attention from activities that provide genuine value.

Concrete Case

An example of a product we decided to continue building internally is the frontend tracking pipeline. To evaluate this decision, we consulted multiple external service providers, including input from product and marketing teams to discuss our options collectively. Ultimately, we found limited interest from end users regarding features offered by external solutions and identified a significant cost disparity, leading us to retain our existing solution.

Conversely, for ingesting marketing data (e.g., acquiring data from external providers like Google Ads and Facebook Ads for our warehouse), we opted to utilize Rivery, as it proved more reliable and cost-efficient than attempting to build a solution ourselves.

We periodically reassess our existing tools—not all at once, but approximately every six months.

Include the Mesh Nodes in the Tech Watch

While we are not driven by trends, we remain vigilant about developments in the field.

Numerous channels exist for staying informed, including personal tech watches, discussions with solution providers, and interactions within the French MDN (Modern Data Network). We also exchange knowledge with peers from other organizations, sharing our experiences and learning from theirs. Feel free to connect with me on LinkedIn if you're interested!

We maintain an open mindset regarding change, even concerning our most cherished in-house software.

One critical lesson we've learned is that tech watches should involve end users, such as Data Scientists or Data Analysts.

Concrete Case

Recently, we evaluated our Airflow setup (utilizing Composer) against alternative solutions in the market or the possibility of deploying it within our Kubernetes cluster for self-management.

We assembled a group comprising members from various teams and expertise levels, each bringing different experiences with Airflow.

Involving others not only fosters inclusivity but also ensures we leverage their expertise to make collective decisions that impact everyone.

Bring Meaning

Like any infrastructure team, feedback often arises only when things go awry. How many times does someone send a "Hey! I'm starting work today, and everything is running smoothly. Great job!" note to their infrastructure team? Infrastructure teams operate behind the scenes, with their mission focused on ensuring reliability becomes an expectation.

If you manage such a team, celebrating successes, linking achievements to the broader projects of other squads, and communicating these milestones is crucial. Quality business outcomes rely on high-quality infrastructure, and recognizing this boosts team morale while showcasing potential value to stakeholders.

Concrete Case

Starting with the names of projects in our roadmap, we emphasize the business impact of initiatives that may appear purely technical. For instance, labeling a project “support [specific squad] in developing a reverse ETL to calculate the probability of [particular business case]” gives context and significance to the technical endeavor.

As with any team, communication is essential for sharing and celebrating achievements, and we make it a point to acknowledge significant milestones.

Conclusion

Two years after implementing the Data Mesh paradigm at BlaBlaCar, we can confidently say that having a data platform team within this structure has enabled us to execute purely technical projects that would have otherwise been unfeasible. This has positively influenced the productivity of our domain squads and, by extension, our overall business.

Allocating resources to this transversal team encourages sharing, collaboration, and governance.

Maintaining the right balance requires ongoing effort: influencing without authority, selecting the appropriate battles, managing a broad scope, and cultivating shared knowledge within the team.

rhondamuse.com

Managing a Data Platform Team in a Data Mesh: Key Insights

Introduction

Generic Products

Concrete Case

What Didn't Work

What Did Work

Cope with Too Many Stakeholders

Concrete Case

Build the Right Team

Concrete Case

Pick Your Battles

Concrete Case

Align with Key Stakeholders

Concrete Case

What Did Work

Flexibility over Hard Rules

Concrete Case

Innovate Around Real Use Cases, Not Trends

Concrete Case

Don't Take on All the Roles

Concrete Case

Build vs. Buy to Reduce Your Footprint

Concrete Case

Include the Mesh Nodes in the Tech Watch

Concrete Case

Bring Meaning

Concrete Case

Conclusion

Share the page:

Recent Post:

Keeping Tabs on Kaya: My Journey with a Dog GPS Tracker

Embracing Our Existence: A Journey through Truth and Purpose

Colorectal Cancer Rates Soar Among Young Adults: A Call to Action

Unlocking the Secrets of Big Data: The 5 V's Explained

Unlocking Your Full Potential: Harnessing ChatGPT for Growth

Navigating Your 30s: Embracing the Journey Ahead

Turning Freelancing Dreams into $100K Realities: Lessons Learned

AI's Environmental Impact: Understanding the Carbon Footprint