by Saulet Mukhamadiyev

Generating insights from data and demystifying data science is a crucial but challenging part of building smart cities.  With this in mind, Smart Dubai Data has taken a step back from the discipline of data science itself to consider the wider environment and supporting processes. Our goal is to surround the core data science function with a range of other skilled professionals to maximize our chances of converting policy problems and business requirements into powerful data analysis and software applications.

This post summarizes four connected elements which deliver a more mature operating environment and a stronger data science culture:

  • high performing teams;
  • delivering well-defined use cases;
  • working to strong processes; all supported by a
  • mature operating environment.

1) High performing teams

There is a very human and collaborative element to data science that cannot be ignored. We want competent teams, working to well-defined roles and task attribution, all of whom are committed to communicating openly.  Debates on how to precisely define the expertise and skills required to carry out data science projects will be ongoing.  That said, we believe in a set of core, necessary roles - data engineer, data scientist, data analyst, visualization expert and UI/UX designer.  Contributions will vary across different project, but without them projects will invariably fail. 

We go nowhere though without first a product or project manager (here we are still developing our thinking), who has an instinctive understanding of business owners and their preoccupations, and who has enough technical background to connect these to data requirements and to work alongside the data scientist.

2) Well-defined use cases

These are central to the success of all data science projects.  Understanding that data science should focus on the high impact opportunities to deliver growth, prosperity and happiness to Dubai’s citizens, we have anchored our first set of use cases on city leadership priorities (e.g. the 50 Year Charter).  Each use case is made up of concise challenge statements, a description of the proposed solution, as well as outline costs and benefits, an assessment of data availability and quality.  Each will do at least one of these two things:

  • Provide predictive insights to Dubai’s leaders – we are putting the capability in place for prediction, personalization and optimization.
  • Delivering interpretable information, not data – The city leadership needs high quality insights and information to inform their decision making, rather than raw unprocessed data.

3) Strong processes

Strong is not to be mistaken for rigid, strictly linear approaches.  Organization is definitely needed to prevent data science drift.  Importantly, agile approaches require us to review, challenge, and reorganize with desired outcomes in mind.  So, we have worked to create five elements to Smart Dubai Data projects.

The Discovery Phase is the hard work of building evidence-based use cases which relate directly to public policy or city service challenges (see part 2 above).  This creates the very real prospect of meaningful data science, connected to need and real business requirements, and often grounded in approaches like design thinking. 

Project management kicks in early.  We have found that time spent in the opening stages on the identification of stakeholders, and preparing a detailed project plan to manage ambiguity and risk, saves time and effort later.  It is at this stage that we also break down the project into sprints.

It is not news that getting the right data (attributes) from the right entities and putting it in the right format is the most frequently and heavily underestimated element of data science projects. We place great emphasis on the data acquisition and modelling stage, at which time analyst and engineer will cleanse, transform and normalize data so it becomes useful in the algorithms that will extract patterns from it, and eventually sit at the center of models that encode these patterns.

In the design and development stage we iterate between design and development to build the first minimum viable product, which is created and communicated to the project owner for further user feedback, testing and iteration towards the final product and decision to deploy.

The deployment phase is when models or software are put into the business environment, with the necessary security assessments and other integration tasks.  If all of the above has gone well, this is when we achieve our goal of creating data science that matters. 

4) A mature operating environment.

The wider operating environment - infrastructure, collaborative tools (our favorites right now are Asana and Slack), the ‘office space’ itself - is the foundational layer on which sustainable delivery of data science will be achieved.  Smart Dubai’s infrastructure is complex and consists of various software, hardware, and data pipelines coming from various entities. We are seeking to create the seamless interactions between all Smart Dubai and partner teams involved in the data science process.  We are also now looking at the mix of open-source tools and languages, and commercial products, so that the most useful data science tools, products and frameworks are available onsite to develop use cases.

What now?

Smart Dubai Data plans to ‘lead - and learn together - by doing’.  To illustrate the impact of data science, Smart Dubai is working on two initial use cases.  The first is to help business owners to make critical business decisions such as choosing the location for their business. The second will use machine learning models to predict future skills needed to fill the (tech) jobs of tomorrow.  To re-emphasize the key points in this blog though, this is first and foremost about humans, acting in teams, to deliver data science linked to issues that matter to Dubai in all its forms.  This is how Smart Dubai Data will demystify data science and lead its wider adoption across our city. 

Look out for future blog posts in which we will set out in greater (technical) data both how the use cases themselves and our overall approach are developing.