Search

Home > DataTalks.Club > Large-Scale Entity Resolution - Sonal Goyal
Podcast: DataTalks.Club
Episode:

Large-Scale Entity Resolution - Sonal Goyal

Category: Technology
Duration: 00:53:27
Publish Date: 2022-10-28 17:00:10
Description:

We talked about:

  • Sonal’s background
  • How the idea for Zingg came about
  • What Zingg is
  • The difference between entity resolution and identity resolution
  • How duplicate detection relates to entity resolution
  • How Sonal decided to start working on Zingg
  • How Zingg works
  • What Zingg runs on
  • Switching from consultancy to working on a new open source solution
  • Why Zingg is open source
  • Open source licensing
  • Working on Zingg initially vs now
  • Zingg’s current and future team
  • Sonal’s biggest current challenge
  • Avoiding problems with entity/identity resolution through database design
  • Identity resolution vs basic joins, data fusions, and fuzzy joins
  • Deterministic matching vs probabilistic machine learning
  • Identity and entity resolution applications for fraud detection
  • Graph algorithms vs classic ML in entity resolution
  • Identity resolution success stories
  • What Sonal would do differently given the chance to start over with Zingg
  • Advice for those seeking to realize their own solution to a data problem
  • Reading suggestion from Sonal
  • Conclusion


Links:

  • Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
  • Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466


ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Total Play: 0