Adding Nodes in FlexiConc¶

FlexiConc organizes its analysis workflow using an analysis tree where each node represents either a subset of concordance lines or an arrangement (ordering and grouping) of those lines. This document explains how to add nodes to the analysis tree using FlexiConc’s API.

Overview¶

There are two main types of nodes:

Subset Nodes: Represent subsets of concordance lines obtained by applying a selection algorithm.
Arrangement Nodes: Represent ordered and/or grouped views of concordance lines generated by applying sorting and grouping algorithms.

Nodes are added to the analysis tree via methods provided on an existing node (usually starting from the root). The most common methods are:

add_subset_node(algo_tuple, keep_arrangement=True)
add_arrangement_node(ordering=[], grouping=None)

Adding Subset Nodes¶

A subset node is created by applying a selection algorithm to an existing node. The selection algorithm is specified as a tuple:

(algorithm_name, args)

algorithm_name: A string identifying the selection algorithm (e.g., "Random Sample", "Select by Metadata Attribute").
args: A dictionary of arguments required by the algorithm (e.g., {'sample_size': 20, 'seed': 111}).

When you call add_subset_node, FlexiConc: 1. Checks if a sibling node with the same algorithm configuration exists. If so, it returns that node. 2. Otherwise, it executes the selection algorithm on the current node’s concordance subset. 3. Creates a new subset node that contains the selected line IDs. 4. Optionally, it copies and restricts arrangement information from the parent node.

Example:

# Starting from the root node of the analysis tree, add a subset node
subset_node = c.root.add_subset_node(
    ("Random Sample", {'sample_size': 20, 'seed': 111})
)

This creates a new node that represents a random sample of 20 lines from the current subset.

Another example using metadata attributes:

subset_node_long = c.root.add_subset_node(
    ("Select by Metadata Attribute", {
        'metadata_attribute': "suspension_type", 
        'value': "long",
        'operator': "==",
        'regex': False,
        'case_sensitive': False,
        'negative': False
    })
)

This node will include only those lines where the suspension_type is equal to "long".

Adding Arrangement Nodes¶

An arrangement node provides a new view by ordering and/or grouping the concordance lines. This is done via:

add_arrangement_node(ordering=[], grouping=None)

ordering: A list of tuples. Each tuple has the form (algorithm_name, args), where:
algorithm_name is the name of an ordering algorithm (e.g., "Sort by Token-Level Attribute").
args is a dictionary of parameters for that algorithm (e.g., {'tokens_attribute': "word", 'sorting_scope': "right", 'reverse': True}).
grouping: An optional tuple of the form (algorithm_name, args) for a grouping algorithm (e.g., ("Partition by Ngrams", {'tokens_attribute': "pos", 'positions': [-1], 'case_sensitive': True})). If no grouping is needed, this can be set to None.

When you call add_arrangement_node, FlexiConc: 1. Executes each ordering algorithm and combines their sort keys to produce a final ordering. 2. If a grouping algorithm is provided, it applies the algorithm to partition the lines. 3. Creates a new arrangement node that contains the ordering result, grouping result (if any), and associated token span data.

Example:

arrangement_node = c.root.add_arrangement_node(
    ordering=[
        ("Sort by Token-Level Attribute", {
            'tokens_attribute': "word",
            'sorting_scope': "right",
            'reverse': True
        })
    ],
    grouping=("Partition by Ngrams", {
        'tokens_attribute': "pos",
        'positions': [-1],
        'case_sensitive': True
    })
)

This example creates an arrangement node that orders the lines based on a token-level attribute (here, sorting by the right context of the token "word" in reverse order) and partitions the lines into groups based on the last token’s part-of-speech (using an n-gram partitioning algorithm).