How do we ensure the users we test features on are representative of Google Photos users as a whole?

Scope the problem

Google Photos uses several vendors to collect data for machine learning. We did not have a sense of how much the users recruited by these vendors aligned with the Google Photos population as a whole. Through stakeholder interviews, I identified the key dimensions on which we needed to make sure our testing population aligned with Google Photos users.

Process

I conducted a logs analysis to examine how the users in our machine learning studies represented our broad pool of Google Photos users. Using advanced SQL queries and R-code to pull, transform, and analyze our user records, I was able to identify areas where our testing populations closely matched the Google Photos population, as well as areas where we would need to recruit more users to ensure high data quality.

Impact

We used these results in strategic planning to identify a new vendor for machine learning evaluation recruitment. We were able to point to areas where we’ve had trouble recruiting users in the past in negotiations with prospective vendors to make sure our new vendor would be able to recruit the users necessary to make our algorithms as representative as possible.

Previous
Previous

Personalized User Evaluations

Next
Next

Making Group Photos Shine