I recently presented to a local organization on setting/scaling up data science teams. We had a thoughtful one-hour conversation about the pseudo-prerequisites to accelerating the adoption of data science within an organization. Our conversation kept coming back to the same point: data scientists are a magnifying glass for an organization – whether if an organization has a strong analytic culture or if its’ systems are disparate – the current status will only be amplified with data scientists. But when data scientists unearth issues – in the data, in the infrastructure, or even the culture – its an opportunity to improve organizational productivity and culture. Now for some subtext:
In October of 2012, Harvard Business Review dubbed the Data Scientist as the sexiest job of the 21st century. Four years later, the immense demand remains unchanged for individuals who can write code to sift through both known-critical and seemingly superfluous data, correctly develop and apply statistical models, and effectively communicate key insights to improve organizational outcomes. In fact, as success stories continue to pour into the media, organizations – many you are very familiar with, but shall remain nameless – are hastily moving to add individuals with these skillsets onto their teams to explore their environments. With big data and analytics raising the eyebrows of most of c-suite executives, business units are catalyzing for the deep dive into the data science waters. While there are plenty of insights regarding the ideal skillset of a data scientist – hint: it’s the soft skills – very few mention the prerequisites for scaling data science teams. This begs the question: so if I am building a data science team, what should I do ASAP to make sure my team is successful? What follows are some of the overlooked aspects of building data science teams that my peers and I have personally experienced – some are even blockers for today’s most highly effective data science teams.
1. Access to data.
This shouldn’t come as a surprise, but data scientists work with data, preferably in the rawest form its collected. Giving permission to data might seem straightforward but some organizations will limit data scientists to data marts or cubes (pre aggregated data). With most data scientists spending a big chunk of their time playing data “Go Fish”, making data available allows data scientists to focus on what they do best.
2. Appropriate governance.
With freedom comes responsibility. Providing appropriate access to individuals who most likely have the highest analytical ceilings of an organization also means having strong data governance practices in place. Organizations should be thoughtful about the risk and reward of limiting access. Applying some basic data governance practices will allow data scientists understand the data, the data flow, and data quality issues.
3. Provide an analytic sandbox.
Data scientists are constantly building models and structuring data for analysis – its 80% of the job. Create a space for teams to build models and share data so that processes do not have to be re-run from scratch each time will save valuable business time.
4. Leverage data visualization tools.
This I believe is a big one. First, data science is more than just model building, it’s collecting critical insights about an organization and effectively communicating those insights. One of the best ways a data scientist can do this is by building tools that allow key stakeholders to see and interact with the story the data tells. Second, good data tools also build a culture of data fluency by allowing the business to interact and understand the who, what, and where of the business. They allow an organization to take action. The tools build trust around using data to improve the business and getting to the when, why and how.
5. Keep your team structures flexible.
If you are introducing the concept of data science teams to your organization for the first time you’ll probably spend hundreds of hours conceptualizing the ideal organizational structure. That’s fine, but just know that a team’s structure is likely to evolve in the first few years depending on the people and existing processes and structures you already have in place. In a nutshell, put more of a premium on collaboration over team structure.
6. Integrate data scientists with existing business units.
It’s great when other data scientists can collaborate with each other. It’s just as important that data scientists work with business units to better understand and solve problems. This would elevates data scientists from reactive stats-gatherers to proactive partners
7. Be sure your data science sponsors are 100% committed to supporting the team.
Data science return on investment can be accelerated – or limited – by leader support. These individuals are evangelists and enforcers for data science work across the organization. In some cases there are leaders who will leverage the idea of data scientists – not actually implement work, but hype the potential work – just to move up the organizational ranks, which hinders the effectiveness of data scientists and the organization.
8. Hire for diverse skillsets.
Prioritize culture and communication, but also make sure that your data scientists bring varying analytic skillsets. If you are trying to build a robust data science team it should not just be data scientists. Consider having supporting roles that might include a project manager, business analysts, data visualization specialists, UX designer/front-end developer and/or ETL developers
9. Develop internal team skillsets.
This should apply to all individuals in an organization because skillsets should be constantly evolving to match the tools and technologies that improve the business; consider embracing open-source technologies and cloud-based tools.
10. Be willing to fail.
Experimentation is a cornerstone of data science. It allows the business to learn what not to do. Experimentation shouldn’t happen on big parts of the business, they should be in small, controllable chunks. This is the heart of data science and many organizations might be afraid that they are going to lose. In reality they are going learn quickly about what does and doesn’t grow their business and should adopt the fail fast mantra.
11. Iterate quickly.
This is another cornerstone of data science. Data science projects shouldn’t take months to spin out initial insights. The team should learn what works and what don’t from the data using short cycles and focus on quick wins for the organization. If something works, then move it to a larger production environment.
12. Be comfortable with data not being perfect.
Because you are iterating quickly data might not be perfect. Data science is about action – avoid slowing down to do significant QA on data, extensive data collection, and/or complex modeling – at least to start.
13. Plan now for scaling up.
While you should start small and get some quick wins, start thinking about how you will want to scale and automate insights from the data science practice.
Wrap things up: highly effective data science teams don’t appear overnight. Building a successful team means developing people, culture, processes, science, and technologies. To do so, there are a number of things that can be done – many of these steps are rarely outlined for an organization but I’ve tried to highlight some of these components above. Most of the points above are just good business practices, practices that are highlighted by data scientists.