Hierarchical Clustering: A 0.585 Revenue Approximation
Noga Alon, Yossi Azar, Danny Vainstein
Subject areas: Approximation algorithms, Clustering, Combinatorial optimization
Presented in: Session 1B, Session 3E
[Zoom link for poster in Session 1B], [Zoom link for poster in Session 3E]
Abstract:
Hierarchical Clustering trees have been widely accepted as a useful form \nof clustering data, resulting in a prevalence of adopting fields\nincluding phylogenetics, image analysis, bioinformatics and more. \nRecently, Dasgupta (STOC 16') initiated the analysis of \nthese types of algorithms through the lenses of approximation. \nLater, the dual problem was considered by Moseley and Wang (NIPS 17') dubbing it the Revenue goal function. \nIn this problem, given a nonnegative weight $w_{ij}$ for each pair \n$i,j \in [n]=\{1,2, \ldots ,n\}$, the objective is to find a tree $T$ whose\nset of leaves is $[n]$ that maximizes the function \n$\sum_{i<j \in [n]} w_{ij} (n -|T_{ij}|)$, where $|T_{ij}|$ is the number\nof leaves in the subtree rooted at the least common ancestor of $i$ and $j$.\n\n \nIn our work we consider the revenue goal function and prove the following \nresults. First, we prove the existence of a bisection \n(i.e., a tree of depth $2$ in which the root has two children, each being\na parent of $n/2$ leaves) \nwhich approximates the general optimal tree solution up to a factor of \n$\frac{1}{2}$ (which is tight). Second, we apply \nthis result in order to prove a $\frac{2}{3}p$ approximation for the general revenue problem, where $p$ is defined as the approximation ratio of the \textsc{Max-Uncut Bisection} problem. \nSince $p$ is known to be at least $0.8776$ \citep{Better_Balance_by_Being_Biased_A_08776_Approximation_for_Max_Bisection} \citep{An_improved_semidefinite_programming_hierarchies_rounding_approximation_algorithm_for_maximum_graph_bisection_problems}, \nwe get a $0.585$ approximation algorithm for the revenue problem. This \nimproves a sequence of earlier results which culminated in an\n$0.4246$-approximation guarantee \citep{Bisect_and_Conquer:_Hierarchical_Clustering_via_Max-Uncut_Bisection}.