Selecting Algorithms¶
Select by Metadata Attribute¶
Path: flexiconc/algorithms/select_by_metadata_attribute.py
Description:
Selects lines based on whether a specified metadata attribute compares to a given target value. If a list is provided as the target value, membership is tested using equality. For a single numeric value, a comparison operator (==, <, <=, >, >=) can be specified. For strings, only equality (with optional regex matching and case sensitivity) is supported.
Arguments:
Name | Type | Description |
---|---|---|
metadata_attribute | string | The metadata attribute to filter on. |
value | ['string', 'number', 'array'] | The value to compare against, or a list of acceptable values. When a list is provided, only equality is used. |
operator | string | The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='. |
regex | boolean | If True, use regex matching for string comparisons (only with equality). Default is False. |
case_sensitive | boolean | If True, perform case-sensitive matching for strings. Default is False. |
negative | boolean | If True, invert the selection. Default is False. |
Show full JSON schema
{
"type": "object",
"properties": {
"metadata_attribute": {
"type": "string",
"description": "The metadata attribute to filter on.",
"x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
},
"value": {
"type": [
"string",
"number",
"array"
],
"description": "The value to compare against, or a list of acceptable values. When a list is provided, only equality is used."
},
"operator": {
"type": "string",
"enum": [
"==",
"<",
"<=",
">",
">="
],
"description": "The comparison operator for numeric comparisons. Only allowed for single numeric values. Default is '=='.",
"default": "=="
},
"regex": {
"type": "boolean",
"description": "If True, use regex matching for string comparisons (only with equality). Default is False.",
"default": false
},
"case_sensitive": {
"type": "boolean",
"description": "If True, perform case-sensitive matching for strings. Default is False.",
"default": false
},
"negative": {
"type": "boolean",
"description": "If True, invert the selection. Default is False.",
"default": false
}
},
"required": [
"metadata_attribute",
"value"
]
}
Select by Rank¶
Path: flexiconc/algorithms/select_rank_wrapper.py
Description:
Selects lines based on rank values obtained from the ranking keys in the ordering_result['rank_keys'] of the current node, by default by the first ranking key.
Arguments:
Name | Type | Description |
---|---|---|
ranking_column | string | The ranking column to use for selection. |
comparison_operator | string | The comparison operator to use for the ranking scores. |
value | number | The numeric value to compare the ranking scores against. |
Show full JSON schema
{
"type": "object",
"properties": {
"ranking_column": {
"type": "string",
"description": "The ranking column to use for selection.",
"x-eval": "dict(enum=[f'{x}: {node.algorithms[\"ordering\"][x][\"algorithm_name\"]}' for x in list(node.ordering_result['rank_keys'])], default=[f'{x}: {node.algorithms[\"ordering\"][x][\"algorithm_name\"]}' for x in list(node.ordering_result['rank_keys'])][0])"
},
"comparison_operator": {
"type": "string",
"enum": [
"==",
"<=",
">=",
"<",
">"
],
"description": "The comparison operator to use for the ranking scores.",
"default": "=="
},
"value": {
"type": "number",
"description": "The numeric value to compare the ranking scores against.",
"default": 0
}
},
"required": []
}
Select by Token-Level Numeric Attribute¶
Path: flexiconc/algorithms/select_by_token_numeric_value.py
Description:
Selects lines based on a token-level attribute using numeric comparison at a given offset. If a list is provided for 'value', only equality comparison is performed.
Arguments:
Name | Type | Description |
---|---|---|
value | ['number', 'array'] | The numeric value(s) to compare against. If a list is provided, only equality comparison is supported. |
tokens_attribute | string | The token-level attribute to check. |
offset | integer | The token offset to check. |
comparison_operator | string | The comparison operator to use for numeric values. Ignored if 'value' is a list. |
negative | boolean | If True, inverts the selection. |
Show full JSON schema
{
"type": "object",
"properties": {
"value": {
"type": [
"number",
"array"
],
"items": {
"type": "number"
},
"description": "The numeric value(s) to compare against. If a list is provided, only equality comparison is supported.",
"default": 0
},
"tokens_attribute": {
"type": "string",
"description": "The token-level attribute to check.",
"x-eval": "dict(enum=[col for col in list(conc.tokens.columns) if col not in {'id_in_line', 'line_id', 'offset'} and ('int' in str(conc.tokens[col].dtype) or 'float' in str(conc.tokens[col].dtype))])"
},
"offset": {
"type": "integer",
"description": "The token offset to check.",
"default": 0,
"x-eval": "dict(minimum=min(conc.tokens['offset']), maximum=max(conc.tokens['offset']))"
},
"comparison_operator": {
"type": "string",
"enum": [
"==",
"<",
">",
"<=",
">="
],
"description": "The comparison operator to use for numeric values. Ignored if 'value' is a list.",
"default": "=="
},
"negative": {
"type": "boolean",
"description": "If True, inverts the selection.",
"default": false
}
},
"required": [
"value",
"tokens_attribute"
]
}
Select by Token-Level String Attribute¶
Path: flexiconc/algorithms/select_by_token_string.py
Description:
Selects lines based on a token-level attribute (string matching) at a given offset. Supports regex and case sensitivity. The search_terms argument is a list of strings to match against.
Arguments:
Name | Type | Description |
---|---|---|
search_terms | array | The list of string values to match against. |
tokens_attribute | string | The token attribute to check (e.g., 'word'). |
offset | integer | The token offset to check. |
case_sensitive | boolean | If True, performs a case-sensitive match. |
regex | boolean | If True, uses regex matching. |
negative | boolean | If True, inverts the selection. |
Show full JSON schema
{
"type": "object",
"properties": {
"search_terms": {
"type": "array",
"items": {
"type": "string"
},
"description": "The list of string values to match against.",
"default": []
},
"tokens_attribute": {
"type": "string",
"description": "The token attribute to check (e.g., 'word').",
"default": "word",
"x-eval": "dict(enum=list(set(conc.tokens.columns) - {'id_in_line', 'line_id', 'offset'}))"
},
"offset": {
"type": "integer",
"description": "The token offset to check.",
"default": 0,
"x-eval": "dict(minimum=min(conc.tokens['offset']), maximum=max(conc.tokens['offset']))"
},
"case_sensitive": {
"type": "boolean",
"description": "If True, performs a case-sensitive match.",
"default": false
},
"regex": {
"type": "boolean",
"description": "If True, uses regex matching.",
"default": false
},
"negative": {
"type": "boolean",
"description": "If True, inverts the selection.",
"default": false
}
},
"required": [
"search_terms"
]
}
Manual Line Selection¶
Path: flexiconc/algorithms/select_manual.py
Description:
Manually selects lines into a subset by specifying line IDs or groups (partitions or clusters) from the active node's grouping result. Additionally, ensures selection is restricted to allowed lines.
Arguments:
Name | Type | Description |
---|---|---|
line_ids | array | A list of specific line IDs to include in the subset. |
groups | array | A list of group identifiers (by label or number) to include lines from. For clusters, groups may be nested, and all matching groups in the hierarchy will be used. |
Show full JSON schema
{
"type": "object",
"properties": {
"line_ids": {
"type": "array",
"items": {
"type": "integer"
},
"description": "A list of specific line IDs to include in the subset."
},
"groups": {
"type": "array",
"items": {
"type": [
"string",
"integer"
]
},
"description": "A list of group identifiers (by label or number) to include lines from. For clusters, groups may be nested, and all matching groups in the hierarchy will be used."
}
},
"required": []
}
Random Sample¶
Path: flexiconc/algorithms/select_random.py
Description:
Selects a random sample of lines from the concordance, optionally using a seed.
Arguments:
Name | Type | Description |
---|---|---|
sample_size | integer | The number of lines to sample. |
seed | integer | The seed for random number generation. |
Show full JSON schema
{
"type": "object",
"properties": {
"sample_size": {
"type": "integer",
"description": "The number of lines to sample.",
"minimum": 1,
"x-eval": "dict(maximum=node.line_count)"
},
"seed": {
"type": "integer",
"description": "The seed for random number generation.",
"default": 42
}
},
"required": [
"sample_size"
]
}
Select Slot¶
Path: flexiconc/algorithms/select_slot.py
Description:
Selects the slot to work with.
Arguments:
Name | Type | Description |
---|---|---|
slot_id | integer | The slot identifier to select. |
Show full JSON schema
{
"type": "object",
"properties": {
"slot_id": {
"type": "integer",
"description": "The slot identifier to select.",
"x-eval": "dict(enum=list(set(conc.matches['slot'])))"
}
},
"required": [
"slot_id"
]
}
Select Weighted Sample by Metadata¶
Path: flexiconc/algorithms/select_weighted_sample_by_metadata.py
Description:
Selects a weighted sample of lines based on the distribution of a specified metadata attribute.
Arguments:
Name | Type | Description |
---|---|---|
metadata_attribute | string | The metadata attribute to stratify by (e.g., 'text_id', 'speaker'). |
sample_size | integer | The total number of lines to sample. |
seed | ['integer'] | An optional seed for generating the pseudo-random order. |
Show full JSON schema
{
"type": "object",
"properties": {
"metadata_attribute": {
"type": "string",
"description": "The metadata attribute to stratify by (e.g., 'text_id', 'speaker').",
"x-eval": "dict(enum=list(set(conc.metadata.columns) - {'line_id'}))"
},
"sample_size": {
"type": "integer",
"description": "The total number of lines to sample.",
"minimum": 1,
"x-eval": "dict(maximum=node.line_count)"
},
"seed": {
"type": [
"integer"
],
"description": "An optional seed for generating the pseudo-random order.",
"default": 42
}
},
"required": [
"metadata_attribute",
"sample_size"
]
}