HiDi’s clean module exposes functionality for cleaning data.
hidi.clean.
DedupeTransform
Bases: hidi.transform.Transform
hidi.transform.Transform
Deduplicate link-item tall skinny DataFrame
transform
Takes a df that has link_id and item_id columns, and deduplicates them so that each pair is represented at most once.
df
link_id
item_id