Why no "grouping features" arg in record_linker?

User 4751 | 4/18/2016, 12:53:44 PM

In GLC's record_linker, why is there no support for grouping/blocking?

For nearest_neighbor_deduplication, there is, but not for record_linker... could anyone explain the reasoning here?

Comments

User 4 | 4/18/2016, 6:00:30 PM

The difference between record_linker and nearest_neighbor_deduplication: the former is optimized for users who may not (yet) realize they want (or need) grouping. We thought of it as a slightly more advanced feature, and made sure it was available in the lower-level nearest_neighbor_deduplication API. On the other hand, as people look for this feature we will definitely consider exposing it for record_linker as well.


User 4751 | 4/21/2016, 4:01:16 PM

Thank you @Zach . Knowing the motivation behind an API and choices like these helps a lot.