Skip to content

Selecting Algorithms

Select by Metadata Attribute

Path: flexiconc/algorithms/select_by_metadata_attribute.py

Description:

Selects lines based on whether a specified metadata attribute compares to a given target value. If a list is provided as the target value, membership is tested using equality. For a single numeric value, a comparison operator (==, <, <=, >, >=) can be specified. For strings, only equality (with optional regex matching and case sensitivity) is supported.

Arguments:

Name Type Description
metadata_attribute string The metadata attribute to filter on.
value ['string', 'number', 'array'] The value to compare against, or a list of acceptable values. When a list is provided, only equality is used.
operator string The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='.
regex boolean If True, use regex matching for string comparisons (only with equality). Default is False.
case_sensitive boolean If True, perform case-sensitive matching for strings. Default is False.
negative boolean If True, invert the selection. Default is False.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "metadata_attribute": {
      "type": "string",
      "description": "The metadata attribute to filter on.",
      "x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
    },
    "value": {
      "type": [
        "string",
        "number",
        "array"
      ],
      "description": "The value to compare against, or a list of acceptable values. When a list is provided, only equality is used."
    },
    "operator": {
      "type": "string",
      "enum": [
        "==",
        "<",
        "<=",
        ">",
        ">="
      ],
      "description": "The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='.",
      "default": "=="
    },
    "regex": {
      "type": "boolean",
      "description": "If True, use regex matching for string comparisons (only with equality). Default is False.",
      "default": false
    },
    "case_sensitive": {
      "type": "boolean",
      "description": "If True, perform case-sensitive matching for strings. Default is False.",
      "default": false
    },
    "negative": {
      "type": "boolean",
      "description": "If True, invert the selection. Default is False.",
      "default": false
    }
  },
  "required": [
    "metadata_attribute",
    "value"
  ]
}

Select by Rank

Path: flexiconc/algorithms/select_rank_wrapper.py

Description:

Selects lines based on rank values obtained from a selected 'algo_*' key in the ordering_result['rank_keys'] of the active_node, using a comparison operator and value.

Arguments:

Name Type Description
algo_key string The specific algorithm key from rank_keys available at the current node.Allowed values have the form 'algo_N', and the recommended value is most often 'algo_0', i.e. the top-level ranking.
comparison_operator string The comparison operator to use for rank values.
value number The value to compare the rank keys against.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "algo_key": {
      "type": "string",
      "description": "The specific algorithm key from rank_keys available at the current node.Allowed values have the form 'algo_N', and the recommended value is most often 'algo_0', i.e. the top-level ranking.",
      "x-eval": "dict(enum=sorted([key for key in active_node.ordering_result['rank_keys'].keys() if key.startswith('algo_')], key=lambda k: int(k.split('_')[1])), default=sorted([key for key in active_node.ordering_result['rank_keys'].keys() if key.startswith('algo_')], key=lambda k: int(k.split('_')[1]))[0])"
    },
    "comparison_operator": {
      "type": "string",
      "enum": [
        "==",
        "<=",
        ">=",
        "<",
        ">"
      ],
      "description": "The comparison operator to use for rank values.",
      "default": "=="
    },
    "value": {
      "type": "number",
      "description": "The value to compare the rank keys against.",
      "default": 0
    }
  },
  "required": []
}

Select by Sort Keys

Path: flexiconc/algorithms/select_sort_wrapper.py

Description:

Selects lines based on sort keys obtained from the active node's ordering_result['sort_keys'], using a comparison operator and a specified value.

Arguments:

Name Type Description
comparison_operator string The comparison operator to use for sort keys.
value number The value to compare the sort keys against.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "comparison_operator": {
      "type": "string",
      "enum": [
        "==",
        "<=",
        ">=",
        "<",
        ">"
      ],
      "description": "The comparison operator to use for sort keys.",
      "default": "=="
    },
    "value": {
      "type": "number",
      "description": "The value to compare the sort keys against.",
      "default": 0
    }
  },
  "required": []
}

Select by a Token-Level Attribute

Path: flexiconc/algorithms/select_by_token_attribute.py

Description:

Selects lines based on the specified token-level attribute at a given offset, with optional case-sensitivity, regex matching, or numeric comparison.

Arguments:

Name Type Description
value ['string', 'number'] The value to match against.
tokens_attribute string The positional attribute to check (e.g., 'word').
offset integer The offset from the concordance node to apply the check.
case_sensitive boolean If True, performs a case-sensitive match (for string values).
regex boolean If True, uses regex matching instead of exact matching (for string values).
comparison_operator string Comparison operator for numeric values. Ignored for string values.
negative boolean If True, inverts the selection (i.e., selects lines where the match fails).
Show full JSON schema
{
  "type": "object",
  "properties": {
    "value": {
      "type": [
        "string",
        "number"
      ],
      "description": "The value to match against.",
      "default": ""
    },
    "tokens_attribute": {
      "type": "string",
      "description": "The positional attribute to check (e.g., 'word').",
      "default": "word",
      "x-eval": "dict(enum=list(set(conc.tokens.columns) - {'id_in_line', 'line_id', 'offset'}))"
    },
    "offset": {
      "type": "integer",
      "description": "The offset from the concordance node to apply the check.",
      "default": 0,
      "x-eval": "dict(minimum=min(conc.tokens['offset']), maximum=max(conc.tokens['offset']))"
    },
    "case_sensitive": {
      "type": "boolean",
      "description": "If True, performs a case-sensitive match (for string values).",
      "default": false
    },
    "regex": {
      "type": "boolean",
      "description": "If True, uses regex matching instead of exact matching (for string values).",
      "default": false
    },
    "comparison_operator": {
      "type": "string",
      "enum": [
        "==",
        "<",
        ">",
        "<=",
        ">="
      ],
      "description": "Comparison operator for numeric values. Ignored for string values.",
      "default": "=="
    },
    "negative": {
      "type": "boolean",
      "description": "If True, inverts the selection (i.e., selects lines where the match fails).",
      "default": false
    }
  },
  "required": [
    "value"
  ]
}

Random Sample

Path: flexiconc/algorithms/select_random.py

Description:

Selects a random sample of lines from the concordance, optionally using a seed.

Arguments:

Name Type Description
sample_size integer The number of lines to sample.
seed ['integer'] An optional seed for random number generation.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "sample_size": {
      "type": "integer",
      "description": "The number of lines to sample.",
      "minimum": 1,
      "x-eval": "dict(maximum=node.line_count)"
    },
    "seed": {
      "type": [
        "integer"
      ],
      "description": "An optional seed for random number generation.",
      "default": 42
    }
  },
  "required": [
    "sample_size",
    "seed"
  ]
}

Set Operation

Path: flexiconc/algorithms/select_set_operation.py

Description:

Performs set operations (union, intersection, difference, disjunctive union, complement) on selected lines from specified nodes in the analysis tree.

Arguments:

Name Type Description
operation_type string The type of set operation to perform: 'union', 'intersection', 'difference', 'disjunctive union', or 'complement'.
nodes array A list of nodes to retrieve selected lines from.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "operation_type": {
      "type": "string",
      "enum": [
        "union",
        "intersection",
        "difference",
        "disjunctive union",
        "complement"
      ],
      "description": "The type of set operation to perform: 'union', 'intersection', 'difference', 'disjunctive union', or 'complement'."
    },
    "nodes": {
      "type": "array",
      "items": {},
      "description": "A list of nodes to retrieve selected lines from."
    }
  },
  "required": [
    "operation_type",
    "nodes"
  ]
}

Select Slot

Path: flexiconc/algorithms/select_slot.py

Description:

Selects the slot to work with.

Arguments:

Name Type Description
slot_id integer The slot identifier to select.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "slot_id": {
      "type": "integer",
      "description": "The slot identifier to select.",
      "x-eval": "dict(enum=list(set(conc.matches['slot'])))"
    }
  },
  "required": [
    "slot_id"
  ]
}

Select Weighted Sample by Metadata

Path: flexiconc/algorithms/select_weighted_sample_by_metadata.py

Description:

Selects a weighted sample of lines based on the distribution of a specified metadata attribute.

Arguments:

Name Type Description
metadata_attribute string The metadata attribute to stratify by (e.g., 'text_id', 'speaker').
sample_size integer The total number of lines to sample.
seed ['integer'] Random seed for reproducibility.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "metadata_attribute": {
      "type": "string",
      "description": "The metadata attribute to stratify by (e.g., 'text_id', 'speaker').",
      "x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
    },
    "sample_size": {
      "type": "integer",
      "description": "The total number of lines to sample.",
      "minimum": 1,
      "x-eval": "dict(maximum=node.line_count)"
    },
    "seed": {
      "type": [
        "integer"
      ],
      "description": "Random seed for reproducibility.",
      "default": 42
    }
  },
  "required": [
    "metadata_attribute",
    "sample_size",
    "seed"
  ]
}