Ranking Algorithms¶

KWIC Grouper Ranker¶

Path: flexiconc/algorithms/rank_kwic_grouper.py

Description:

Ranks lines based on the count of a search term in a specified positional attribute within a window.

Arguments:

Name	Type	Description
search_term	string	The term to search for within the tokens.
tokens_attribute	string	The positional attribute to search within (e.g., 'word').
regex	boolean	If True, use regex for matching the search term.
case_sensitive	boolean	If True, the search is case-sensitive.
include_node	boolean	If True, include node-level tokens in the search.
window_start	integer	The lower bound of the window (offset range).
window_end	integer	The upper bound of the window (offset range).
count_types	boolean	If True, count unique types within each line; otherwise, count all matches.

Show full JSON schema

{
  "type": "object",
  "properties": {
    "search_term": {
      "type": "string",
      "description": "The term to search for within the tokens."
    },
    "tokens_attribute": {
      "type": "string",
      "description": "The positional attribute to search within (e.g., 'word').",
      "default": "word",
      "x-eval": "dict(enum=list(set(conc.tokens.columns) - {'id_in_line', 'line_id', 'offset'}))"
    },
    "regex": {
      "type": "boolean",
      "description": "If True, use regex for matching the search term.",
      "default": false
    },
    "case_sensitive": {
      "type": "boolean",
      "description": "If True, the search is case-sensitive.",
      "default": false
    },
    "include_node": {
      "type": "boolean",
      "description": "If True, include node-level tokens in the search.",
      "default": false
    },
    "window_start": {
      "type": "integer",
      "description": "The lower bound of the window (offset range).",
      "x-eval": "dict(minimum=min(conc.tokens['offset']))"
    },
    "window_end": {
      "type": "integer",
      "description": "The upper bound of the window (offset range).",
      "x-eval": "dict(maximum=max(conc.tokens['offset']))"
    },
    "count_types": {
      "type": "boolean",
      "description": "If True, count unique types within each line; otherwise, count all matches.",
      "default": true
    }
  },
  "required": [
    "search_term"
  ]
}