Selecting Algorithms¶
Select by Metadata Attribute¶
Path: flexiconc/algorithms/select_by_metadata_attribute.py
Description:
Selects lines based on whether a specified metadata attribute compares to a given target value. If a list is provided as the target value, membership is tested using equality. For a single numeric value, a comparison operator (==, <, <=, >, >=) can be specified. For strings, only equality (with optional regex matching and case sensitivity) is supported.
Arguments:
Name | Type | Description |
---|---|---|
metadata_attribute | string | The metadata attribute to filter on. |
value | ['string', 'number', 'array'] | The value to compare against, or a list of acceptable values. When a list is provided, only equality is used. |
operator | string | The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='. |
regex | boolean | If True, use regex matching for string comparisons (only with equality). Default is False. |
case_sensitive | boolean | If True, perform case-sensitive matching for strings. Default is False. |
negative | boolean | If True, invert the selection. Default is False. |
Show full JSON schema
{
"type": "object",
"properties": {
"metadata_attribute": {
"type": "string",
"description": "The metadata attribute to filter on.",
"x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
},
"value": {
"type": [
"string",
"number",
"array"
],
"description": "The value to compare against, or a list of acceptable values. When a list is provided, only equality is used."
},
"operator": {
"type": "string",
"enum": [
"==",
"<",
"<=",
">",
">="
],
"description": "The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='.",
"default": "=="
},
"regex": {
"type": "boolean",
"description": "If True, use regex matching for string comparisons (only with equality). Default is False.",
"default": false
},
"case_sensitive": {
"type": "boolean",
"description": "If True, perform case-sensitive matching for strings. Default is False.",
"default": false
},
"negative": {
"type": "boolean",
"description": "If True, invert the selection. Default is False.",
"default": false
}
},
"required": [
"metadata_attribute",
"value"
]
}
Select by Rank¶
Path: flexiconc/algorithms/select_rank_wrapper.py
Description:
Selects lines based on rank values obtained from a selected 'algo_*' key in the ordering_result['rank_keys'] of the active_node, using a comparison operator and value.
Arguments:
Name | Type | Description |
---|---|---|
algo_key | string | The specific algorithm key from rank_keys available at the current node.Allowed values have the form 'algo_N', and the recommended value is most often 'algo_0', i.e. the top-level ranking. |
comparison_operator | string | The comparison operator to use for rank values. |
value | number | The value to compare the rank keys against. |
Show full JSON schema
{
"type": "object",
"properties": {
"algo_key": {
"type": "string",
"description": "The specific algorithm key from rank_keys available at the current node.Allowed values have the form 'algo_N', and the recommended value is most often 'algo_0', i.e. the top-level ranking.",
"x-eval": "dict(enum=sorted([key for key in active_node.ordering_result['rank_keys'].keys() if key.startswith('algo_')], key=lambda k: int(k.split('_')[1])), default=sorted([key for key in active_node.ordering_result['rank_keys'].keys() if key.startswith('algo_')], key=lambda k: int(k.split('_')[1]))[0])"
},
"comparison_operator": {
"type": "string",
"enum": [
"==",
"<=",
">=",
"<",
">"
],
"description": "The comparison operator to use for rank values.",
"default": "=="
},
"value": {
"type": "number",
"description": "The value to compare the rank keys against.",
"default": 0
}
},
"required": []
}
Select by Sort Keys¶
Path: flexiconc/algorithms/select_sort_wrapper.py
Description:
Selects lines based on sort keys obtained from the active node's ordering_result['sort_keys'], using a comparison operator and a specified value.
Arguments:
Name | Type | Description |
---|---|---|
comparison_operator | string | The comparison operator to use for sort keys. |
value | number | The value to compare the sort keys against. |
Show full JSON schema
{
"type": "object",
"properties": {
"comparison_operator": {
"type": "string",
"enum": [
"==",
"<=",
">=",
"<",
">"
],
"description": "The comparison operator to use for sort keys.",
"default": "=="
},
"value": {
"type": "number",
"description": "The value to compare the sort keys against.",
"default": 0
}
},
"required": []
}
Select by a Token-Level Attribute¶
Path: flexiconc/algorithms/select_by_token_attribute.py
Description:
Selects lines based on the specified token-level attribute at a given offset, with optional case-sensitivity, regex matching, or numeric comparison.
Arguments:
Name | Type | Description |
---|---|---|
value | ['string', 'number'] | The value to match against. |
tokens_attribute | string | The positional attribute to check (e.g., 'word'). |
offset | integer | The offset from the concordance node to apply the check. |
case_sensitive | boolean | If True, performs a case-sensitive match (for string values). |
regex | boolean | If True, uses regex matching instead of exact matching (for string values). |
comparison_operator | string | Comparison operator for numeric values. Ignored for string values. |
negative | boolean | If True, inverts the selection (i.e., selects lines where the match fails). |
Show full JSON schema
{
"type": "object",
"properties": {
"value": {
"type": [
"string",
"number"
],
"description": "The value to match against.",
"default": ""
},
"tokens_attribute": {
"type": "string",
"description": "The positional attribute to check (e.g., 'word').",
"default": "word",
"x-eval": "dict(enum=list(set(conc.tokens.columns) - {'id_in_line', 'line_id', 'offset'}))"
},
"offset": {
"type": "integer",
"description": "The offset from the concordance node to apply the check.",
"default": 0,
"x-eval": "dict(minimum=min(conc.tokens['offset']), maximum=max(conc.tokens['offset']))"
},
"case_sensitive": {
"type": "boolean",
"description": "If True, performs a case-sensitive match (for string values).",
"default": false
},
"regex": {
"type": "boolean",
"description": "If True, uses regex matching instead of exact matching (for string values).",
"default": false
},
"comparison_operator": {
"type": "string",
"enum": [
"==",
"<",
">",
"<=",
">="
],
"description": "Comparison operator for numeric values. Ignored for string values.",
"default": "=="
},
"negative": {
"type": "boolean",
"description": "If True, inverts the selection (i.e., selects lines where the match fails).",
"default": false
}
},
"required": [
"value"
]
}
Random Sample¶
Path: flexiconc/algorithms/select_random.py
Description:
Selects a random sample of lines from the concordance, optionally using a seed.
Arguments:
Name | Type | Description |
---|---|---|
sample_size | integer | The number of lines to sample. |
seed | ['integer'] | An optional seed for random number generation. |
Show full JSON schema
{
"type": "object",
"properties": {
"sample_size": {
"type": "integer",
"description": "The number of lines to sample.",
"minimum": 1,
"x-eval": "dict(maximum=node.line_count)"
},
"seed": {
"type": [
"integer"
],
"description": "An optional seed for random number generation.",
"default": 42
}
},
"required": [
"sample_size",
"seed"
]
}
Set Operation¶
Path: flexiconc/algorithms/select_set_operation.py
Description:
Performs set operations (union, intersection, difference, disjunctive union, complement) on selected lines from specified nodes in the analysis tree.
Arguments:
Name | Type | Description |
---|---|---|
operation_type | string | The type of set operation to perform: 'union', 'intersection', 'difference', 'disjunctive union', or 'complement'. |
nodes | array | A list of nodes to retrieve selected lines from. |
Show full JSON schema
{
"type": "object",
"properties": {
"operation_type": {
"type": "string",
"enum": [
"union",
"intersection",
"difference",
"disjunctive union",
"complement"
],
"description": "The type of set operation to perform: 'union', 'intersection', 'difference', 'disjunctive union', or 'complement'."
},
"nodes": {
"type": "array",
"items": {},
"description": "A list of nodes to retrieve selected lines from."
}
},
"required": [
"operation_type",
"nodes"
]
}
Select Slot¶
Path: flexiconc/algorithms/select_slot.py
Description:
Selects the slot to work with.
Arguments:
Name | Type | Description |
---|---|---|
slot_id | integer | The slot identifier to select. |
Show full JSON schema
{
"type": "object",
"properties": {
"slot_id": {
"type": "integer",
"description": "The slot identifier to select.",
"x-eval": "dict(enum=list(set(conc.matches['slot'])))"
}
},
"required": [
"slot_id"
]
}
Select Weighted Sample by Metadata¶
Path: flexiconc/algorithms/select_weighted_sample_by_metadata.py
Description:
Selects a weighted sample of lines based on the distribution of a specified metadata attribute.
Arguments:
Name | Type | Description |
---|---|---|
metadata_attribute | string | The metadata attribute to stratify by (e.g., 'text_id', 'speaker'). |
sample_size | integer | The total number of lines to sample. |
seed | ['integer'] | Random seed for reproducibility. |
Show full JSON schema
{
"type": "object",
"properties": {
"metadata_attribute": {
"type": "string",
"description": "The metadata attribute to stratify by (e.g., 'text_id', 'speaker').",
"x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
},
"sample_size": {
"type": "integer",
"description": "The total number of lines to sample.",
"minimum": 1,
"x-eval": "dict(maximum=node.line_count)"
},
"seed": {
"type": [
"integer"
],
"description": "Random seed for reproducibility.",
"default": 42
}
},
"required": [
"metadata_attribute",
"sample_size",
"seed"
]
}