Building AI-Enabled Products: Malicious Content Detection and VR (Oculus) Development at Facebook

April 09, 2021

Xiang Sheng, Research Scientist Manager @Facebook Research

Jingjing Li, Assistant Professor of Commerce & Associate Director, CBA @McIntire School of Commerce

 

 

 

 

 

 

 

Facebook’s mission is to give people the power to build a community and bring the world closer together. Xiang’s teams use AI/ML to contribute to the company’s mission through two principles:

  • “Give People a Voice” and “Keep People Safe and Protect Privacy”
  • “Build Connection and Community”

During this webinar, Xiang described two projects he has been heavily involved with at Facebook, then engaged in a conversation with CBA Associate Director Jingjing Li to answer questions from the audience. Both projects with key takeaways, as well as the Q&A session, are briefly summarized below.

AI/ML-based integrity solutions – Political campaign ads fraud detection

A. Challenges:

  1. Huge volume – tens of millions of daily ads creation
  2. Bad actors are creative
  3. Manual review is not scalable and also may not reliable

B. Solutions: AI/ML solutions

  1. Fraud detections
  2. Auto-enforcement
  3. Adaptive shielding
  4. ML-assisted review

C. Results:

  1. Resulted in very low prevalence ration – less than 1% for all policies
  2. Scalable solution, with more than 80% of ads auto-decisioned
  3. System is dynamic, based on capacity of human reviewers
  4. Measurements are ML-enhanced and adaptive
  5. Risk alarm system alerts to new, emerging, and unknown risks

AI/ML enhance VR co-presence –  Connecting people digitally when they can’t gather in person (specifically OculusTM)

A. Motivations and Challenges:

  1. Network bandwidth has significantly increased in recent years
  2. Pandemic reinforced challenges – and demand for – socializing and “gathering” remotely
  3. While videoconference can mitigate some of those co-presence challenges, the experience can still be improved

B. Solutions: Connect users through meaningful/relevant VR content

  1. ML-based VR content understanding
  2. ML-based VR social interactions understanding

C. Results: In 2020 there was a significant increase in VR users, much of it by non-gamers

Questions from the audience:

  •  Do you anticipate VR and AR will eventually take over our desktop/laptop computers?

A simple analogy: VR is more like a desktop; AR is more like a mobile phone –VR and AR will eventually work together to the stage of “mixed reality” – they already interact and communicate with each other.

  • Can you summarize some common challenges you’ve encountered with these projects?

Managing up – it is critical to set the proper expectations for your leaders, especially when they don’t have an in-depth understanding of ML. Gain a good understanding of what they want to optimize, and then define and focus on ML-solvable problems. Set measurements and metrics. Leaders need to embrace the uncertainty of ML.

  • How can you ensure your team members work collaboratively on these highly sophisticated, yet highly uncertain, projects?

 There is no standard approach. After experiencing multiple project management processes, my own formula can be summarized into two aspects:

First, we should set up the right culture: Be brave and open. I want to make sure my team members assume good intentions from others and are brave to give critical feedback. I always promote career development for each team member, and they trust me that all I am doing is in their best interest. Another culture to build is transparency and openness. At Facebook, we have an internal Facebook workplace for every employee. Each employee is encouraged to share publicly about whatever he/she is working on and the related takeaways. There are no barriers to knowledge sharing at Facebook.

Second, you have to get to know each team member, match team members to their strengths, and minimize dependencies among them. Unique to my team, we don’t like most traditional software companies that split a project into stages (e.g., data preprocessing, modeling and evaluation), and let team members pick a module based on their knowledge and bandwidth. I coach my team members to handle end-to-end projects (e.g., equipped with both business and technical knowledge), and each member is accountable for one functionality or project. Other more junior team members can provide lower-level implementations that have to be well synced up with the senior project owners. This reliance on end-to-end expertise minimizes the internal dependencies but heavily relies on the senior project owner to understand the entire process inside out and drive it through.

  •  If reducing dependencies is a primary goal for the team, does that mean that the highly successful team is composed of all generalists?

Yes, in Facebook we highly value generalists, and we hope each talent is well equipped with both business and technical knowledge and can complete a project by him- or herself. However, I have to admit that this type of talent is a rare find in the job market. Therefore, we try all the possible recruiting channels to find the best candidates. We go to conferences and host regular social dinners in Bay Area to connect with people with relevant expertise.

Once they have entered the team, we make it very clear that we expect them to be end-to-end contributors. On my team, the more general knowledge a team member has, the higher his/her level and pay will be.

To make sure that they can grow into a generalist, in Facebook, we limit the number of people each manager supports to foster a deeper understanding between manager and staff. Managers are held accountable for the success of the staff beneath them. Career growth and knowledge expansion are always front of mind for Facebook managers. In order to know staff more deeply – upon hire, we usually discuss a variety of topics:

  • Personal and professional highlights and milestones to date
  • What time of day are they most productive?
  • Thrive under pressure or need space to think?
  • Communication preferences – email, chat, portals, phone
  • Recognition preferences – private or public?
  • Method of learning – virtual classes, e-learning sources, coaching, on-the-job training
  • Expectations of colleagues – how best to collaborate and thrive

Based on their unique strengths and work styles, we define projects and goals accordingly. We make sure to revisit the same questions periodically to understand their progress and make sure they are on track with their career goals.

  • How do you determine when something is not working versus it isn’t working YET?

New projects should have measurements and desired outcomes defined at the start, including how much time will be budgeted. Tradeoffs are frequently required when trying to decide when to “cut losses” and set priorities. Senior engineers are usually the best source for reliable information and guidance. Their “maturity framework” helps determine priorities in the system, and what resources and staff should be allocated to the processes. This is a highly unique approach to project management for ML at Facebook.

  • What are the unique challenges associated with AI at scale? How do you define user requirements when Facebook has billions of users?

ML for big data is slightly different than the traditional software development or ML process. Big data will be populated as it moves through each stage, presenting a number of new engineering challenges. Due to the large volume of data and complexity of the ML problems, instead of building a single ML model, we build an ML system composed of hundreds or even thousands of ML models, where one model’s output can be the input to another model. This complex ML system creates greater dependencies and risks, as one tiny mistake can be propagated and amplified to an unimaginable magnitude, thus greatly impacting the final modeling performance. Therefore, we need to manage and modify ML models with extra caution. Therefore, we have an ML efficiency team that is specifically in charge of analyzing model dependencies and monitoring model performances at every stage of the system.

Furthermore, with large-scale data, sometimes we find that traditional user research methods, such as surveying and interviews, cannot represent the heterogenous user interests at a billion scale, partly due to sampling bias. Therefore, we choose to start with minimal knowledge and assumptions and quickly roll out a minimum viable product (MVP). We then conduct a large number of A/B tests to incrementally create and verify our knowledge about the users. Based on the newly updated knowledge, we roll out another version of MVP and conduct A/B testing. The iterations will end when we receive satisfactory usage and engagement statistics. Using this iterative approach, we can minimize sampling bias due to the inability to formally survey a large number of users and continuously update our knowledge about user preferences in a more agile way.