At #PerfMatters, Jessica Chan, Sarah Dapul-Weberman, and Michelle Vu gave a presentation about building Pinterest’s first dedicated performance team and the challenges involved. Here are my notes.
First Dedicated Performance team
- Pinterest serves over 200 million global monthly active users and an infrastructure that serves over 1 millions request per second.
- Around 2016, Pinterest migrated from Backbone to React. They saw a 20% improvement in performance and 10-20% improvement in engagement.
- For unauthorized pages, the same migration saw a 30% improvement in performance and a 15% increase in signups, 10% increase in SEO traffic and a 5-7% increase in logins.
- Questions came out of these improvements: Were there bug’s in their performance tracking? Were they still performing this well? They realized they needed a dedicated performance team to better understand what was happening.
- Pinterest chose a custom metric called Pinner Wait Time (PWT) which looks at the slowest load time for content they deem to be critical on a page.
- Custom metrics let them find something they can measure that is directly tied to engagement instead of just vanity metrics with no real impact on the actual experience.
- Identified four steps to build confidence in the data.
- The first step was to set baselines for the different flows across the site. They validated their performance metrics, implemented confidence tests, ensured graphs reflected real user experience and ensured teams understood their metrics.
- The second step was to tie performance metrics to business goals. They run experiments to see which metrics correlate to engagement and which do not. This lets them tie performance to engagement wins, builds better trust in performance, and helps teamst budget time for performance improvements based on impact.
- The third step was an internal “PR campaign”. This included all-company demo’s and custom-built tools to help people get excited and let them know the performance team was there and able to help.
- The fourth step was to fight regressions. Regression protection in some cases could be even more important than the initial optimization.
- Developed Perf Watch—an in-house regression testing framework.
- They run regression tests for each critical page: pages like homefeed, pin closed page and the search page.
- Tests are run for each critical page several hundred times using multiple test runners running parallel.
- They calculate and monitor the 90th percentile of Pinner Wait Time over time. If the test comes back exceeding a threshold for variance, the build is flagged as a regression.
- Running these tests through their build process helps them to identify performance issues quickly to address.
- To help determine what caused the regression, they build Perf Detective which runs a binary search (similar to
git bisect) to determine the offending commit.
- When the team was formed, they were presented with an aggressive goal of improving PWT.
- They started by doing some detailed analysis and brainstorming with various teams to see what the current situation was and what potential optimizations they could make.
- These potential optimizations were then listed based on what work would be taken and what the impact could be.
- Prototyping these optimizations gave them a better understanding of the estimated improvement and level of effort.
- Each optimization was run inside an A/B experiment using an in-house experimentation framework that shows performance impact as well as user engagement impact.
- Their experimentation framework lets them drill in based on user type, geography, etc.
- A dedicated performance team isn’t enough to ensure top-down buy-in and a strong performance culture. The ownership couldn’t just be on their small performance team.
- It can’t just be the performance team making optimizations. The experts for each surface in the company need to be the one making optimizations.
- A centralized knowledge base of performance information and history within the company is an invaluable resource.
- The performance team should build tools to empower teams within the organization to make performance improvements that matter.