This is my writeup of the shape up methodology for Data science projects. At this current point, I have been using this for more than one year. So my thinking about it varied quite a bit as the process matured.
Why Try Shape Up #
There’s this sprint fatigue from regular agile for DS projects. One of the major reasons is that regular DS projects just don’t fit into those neat 2 week sprint boxes. In data science, you really don’t rush into modeling a process because there are a lot of steps in between that need to be properly hashed out.
Your first step is acquiring the data, and you need to know what, where and how often. If your next step involves building the pipeline, you’ve got to figure out the data stores for raw and processed data, plus the amount of preprocessing which may involve cleaning up and fixing the data. Then there’s the exploratory data analysis where you take the data and explore it. At the same time, you need to work with stakeholders at each of these steps making sure you’re in alignment. You need to see if the data aligns with your initial hypothesis. Only after you get an answer does the plug and play different models part come to test which works out.
There’s a real need for deeper work blocks and doing otherwise and rushing to production usually means having a much worse product. There needs to be a planning vs research vs delivery balance. In the planning phase, you make sure you set up some foundation and model the risk properly.
Sometimes for smaller projects, it’s possible to do exactly that, but for the vast majority of projects that involve modeling, shape up brings in much needed clarity.
Core Concepts Adapted #
We tried out 2 week and 4 week sprints and often it involves planning more and having a proper series of work items. Doing regular updates even for longer sprints will reduce the risk quite a bit. For your stakeholders, it’s very important to understand their appetite in something even preliminary so that you can give some estimates.
We do pitches for analysis projects and work towards building up MVPs. I really enjoy the fixed time but variable scope mindset where we can deliver within this time frame.
Shaping Data Science Work #
There’s a problem definition phase where we do framing and shaping of the project together with the stakeholders. Everyone who delivers should be a part of this.
Framing is basically defining what problem we’re actually trying to solve. It’s about getting everyone on the same page about the business question, the success criteria, and what good enough looks like. You’d be surprised how often teams jump into data work without really nailing down whether they’re trying to predict something, classify something, or just understand patterns better.
Shaping takes that framed problem and turns it into something we can actually work on. This means figuring out the rough approach, identifying what data we’ll need, sketching out the solution path, and setting boundaries on what’s in scope and what’s not. It’s not about planning every detail but getting enough clarity that the team knows what hill they’re climbing.
We also do data availability checks and rough solution sketches. We try to put all the potential risk identification upfront. This clarity allows us not to go after things that won’t work.
Betting Table Adaptations #
This is where we handle the stakeholder buy in process and choose between experiments. We balance research vs production work and take technical debt considerations into account.
Implementation #
Once we’ve shaped the work and got buy in at the betting table, it’s time to actually build the thing. The implementation phase is where the rubber meets the road and your team gets uninterrupted time to dive deep into the work.
Unlike traditional sprints where you’re constantly getting pulled into meetings and status updates, Shape Up gives you longer stretches to really focus. We usually work in 2-4 week cycles which gives enough time to get through the messier parts of data science work without feeling rushed. You can spend a week just understanding the data weirdness without someone asking why the model isn’t trained yet.
The key is that during implementation, the scope can flex but the time is fixed. If you discover the data is messier than expected or the initial approach hits a wall, you adjust what gets delivered rather than asking for more time. This forces you to stay focused on what actually matters and not get lost in perfectionism.
We do light check in’s during the cycle, but it’s more about removing blockers than detailed progress reports. The team owns the work and figures out the best way to tackle it day to day.
Cool Down Periods #
These periods are great for model monitoring setup and documentation catchup. We use them for exploring new techniques, knowledge sharing sessions, and handling tech debt.
What Didn’t Translate #
Hill charts for research just don’t work well. Hard stops on exploration are tough to implement. The no estimates challenge is tricky too. Even though it sounds good and personally I use this in my personal projects to great effect, it really doesn’t work for management.
Success Stories and our Hybrid Approaches towards a more pragmatic data science work #
We’ve had projects that really thrived using this approach, It all depends upon your team and stakeholders who are in charge of it. We’ve seen better model outcomes, team happiness changes, and much better delivery predictability. It also prevents scope creep where last minute additions are added. We’ve mixed this with Kanban for certain workflows and have standups. We handle emergency work differently and have separate maintenance cycles. Continuous deployment needs require their own approach too. I feel the true spirit of agile could be maintained but it all depends upon your team its adaptability.