Intro to Clustering

DBSCAN

Parameters

eps
min_samples

Type of Vertices [Points]

Core Vertices: $V \in S \space : |E_V| \geq minSamples$

Border Vertices: $V \in S \space : |E_V| < minSamples$

Noise Vertices: $V \notin S$

Algorithm [Black-Boxing]

# core() and neighbor() depends on eps and min_samples

s = {}
noise = []
for v in V: 
    if core(v): 
        q = []
        h[v] = []
        for neighbor in neighbors(v): 
            h[v].append(neighbor) 
            V.pop(neighbor) 
            q.add(neighbor) 
            while q: 
                curr = q.pop(0) 
                for neighbor in neighbors(curr):
                    h[v].append(neighbor) 
                    V.pop(neighbor)
                    q.add(neighbor) 
        V.pop(v) 
for v in V: 
    noise.add(v) 

Consensus Driven Propagation

Pipeline

Alleviation from the Concept: The Base Model stands before the Court, and each proposal that $B$ presents, Committees $C$ from 1 - N judges that proposal and the Mediator will decide the final outcome. Here each proposal introduces a pair of Vertices, all the proposals that $B$ presents when all judgements are made, will create our final clusters of vertices.

1) Training labeled data (Supervised)

\mathcal{F}_B : D_l \rarr \mathcal{Z}

\mathcal{F}_{C_i} : D_l \rarr \mathcal{Z}, i=1,2,3,...,N

2) Consensus-Driven Propagation

Given $\mathcal{F}_B, \mathcal{F}_{C_1},..., \mathcal{F}_{C_N}$ we proceed to extract deep features by inferring on unlabelled data. Specifically, we have $\mathcal{F}_{B}(D_u)$ and $\mathcal{F}_{C_i}(D_u)$ , each representing a vector in the feature spaces $\mathcal{F}$ of the Base Model and each Committee. Each feature space has information on the deep features of a sample, to cluster we will use a Cosine-Similiarity K-NN algorithm, producing $N+1$ graphs.

From the graphs, we will be able to reach a consensus for each of the pair that the Base Model generated. Specifically, we will use the following metrics:

The Relationship $R$ between two nodes $n_0$ and $n_1$ is $1$ if it is a neighbor in the view of particular graph and $0$ otherwise. That is:

R^{(n_0,n_1)}_{C_i} = \begin{cases} 1 & if \space neighbor(n_0, n_1) \\ 0 & else \end{cases} \space i = 1,2,...,N

The affinity between two nodes is computed by Cosine Similarity of the mapped features of two nodes:

A_{C_i}^{(n_0,n_1)} = cos(<\mathcal{F}_{C_i}(n_0),\mathcal{F}_{C_i}(n_1)>) \space i= 0,1,2,...,N

The local structure of each node is the distribution of similarities between that node and its $k$ nearest neighbors. By computing Cosine Similarities between each node $x$ and its neighbor $x_k$ , we generate a distribution as a result, that is :

D^{x}_{C_i} = \{ cos(<\mathcal{F}_{C_i}(x),\mathcal{F}_{C_i}(x_k)>)\} \space k=1,2,...,K

Based on these metrics, we can now enter the Consensus Phase, this is where we will determine which pair of vertex-vertex we will keep. The Mediator gathers information, combining each metric into vectors and then combining everything together. Specifically we have for each pair of vertices in the base model:

Relationship vector $I_R \in \mathbb{R}^N$ where each element is $R^{(n_0,n_1)}_{C_i}$

Affinity vector $I_A \in \mathbb{R}^{N+1}$ where each element is $A_{C_i}^{(n_0,n_1)}$

From the Distribution for each pair generated by each committee, we compute the Mean and Variance vector for each vertex in each committee (including the Base Model) that is:

Mean vector $I_{D_{mean}} \in \mathbb{R}^{2(N+1)} = (...\mu(D^{n_0}_{C_i}),...,\mu(D^{n_1}_{C_i}))$

Variance vector $I_{D_{var}} \in \mathbb{R}^{2(N+1)} = (...\sigma(D^{n_0}_{C_i}),...,\sigma(D^{n_1}_{C_i}))$

Combining all the vectors we have a vector of dimension $6N +5$ , which is fed into the Mediator MLP to be trained on labelled data $D_l$ . The MLP will re-weight the opinions of the Committee and assign a probability to each edge that the Base Model proposes, in this case a probability could also be negative to indicate that each pair does not have the same identity (i.e. belonging to different clusters). Based on the trained MLP we will filter out any edges proposes by the Base Model that does not meet a high positive probability threshold, we do this for every edges to form different clusters as output.

Recall that we trained on labelled data for our previous stages, as for unlabelled data, we will implement a label propagation algorithm that has the following pseudocode:

SCC = find_component(G) 
queue = []
final_graph_clusters = []
for component in SCC: 
    queue.add(component)

while queue: 
    curr_component = queue.pop()
    if curr_component.size > max_size: 
        weaker_components = disconnect(curr_component, max_size) 
        for weak_component in weaker_components: 
            queue.add(weak_component)
    else:
        final_graph_clusters.add(curr_component) 
        

Finally, we can jointly train on labelled and unlabelled data using the same CNN architecture and the same weight, the following result will produce graph clusters with weights trained on labelled data, deep-learnt from unlabelled data and then clustered together based on parameters such as probability threshold, max cluster size.

Intro to Clustering

DBSCAN​

Parameters​

Type of Vertices [Points]​

Algorithm [Black-Boxing]​

Consensus Driven Propagation​

Pipeline​