WordCruncher Development Site

The Neighborhood (Collocation) Report allows you to examine your search term through significant co-occurring words (friends), key words in context, and more.

How to Use:

Do a search.
This search will be the basis for the calculations of the Neighborhood Report, including any search bounds.
In the WordCruncher toolbar, go to Analyze > Search Results Reports > Neighborhood (Collocation).

Customize

In each tab of the report, use the hamburger menu to sort the table, add filters, copy/export results, and show/hide columns.

Tip: Click on a column header to sort by that column.

Analysis Reports / Neighborhood

Neighbors (Collocates)

Tip: Double-click on a word to see it in context in the Neighborhoods (KWIC) tab.

The Neighbors (Collocates) tab displays a list of all the words that occur near a search term. These words are the neighbors of the search words.

Words highlighted in blue are called friends (also known as collocates). Friends are words that significantly co-occur with the search term. The darker the blue, the stronger the relationship between words. By default, friends are calculated within a range of five words before and after the search term. To change these settings, click the Report Preferences button.

When you generate the report, you will see that the report is filtered by friends. To see all words that co-occur with the search term, remove the filter.

A unique feature of WordCruncher’s Neighborhood Report (in comparison to other text analysis software) is that it accounts for word frequency. Friends are ranked based on how often the friend appears near the search term, considering the number of times the friend also appears with other words. Without considering this element, friends can tend to skew towards higher-frequency words.

Example:

When you search for the word money in the TED Corpus (English), you’ll see that launderer (including launderers, laundering, -laundering) is a strong friend, but it only appears a few times in the corpus. However, 18 out of 20 times that launderer appears, it co-occurs with money, suggesting a very strong relationship.

Earn is another strong friend of money. Earn and money co-occur 41 times, which initially may appear to be a stronger relationship than the one between launderer and money (which only co-occur 18 times). Earn, however, appears a total of 187 times in the text, so less than ¼ of its occurrences actually co-occur with money. Because of this difference in frequency, launderer may have a stronger relationship with money than earn does, although both are friends.

Columns

By default, the report will show the following columns. To show additional columns, use the hamburger menu.

Number: The row number.
Word: The neighbor. If this is a friend (collocate), it is highlighted in blue.
Rating: A score between -10 and 10 (used to identify friends). To qualify as a friend, the rating must be greater than 0.
Sample: The number of times the word co-occurs with the search term.
Total: The total number of times the word occurs in the text.

For additional information on columns and statistics, view Neighbor or Collocate Statistics.

Analysis Reports / Neighborhood

Neighborhoods (KWIC)

The Neighborhoods (KWIC) tab shows key words in context (known as KWIC lines or concordance lines), allowing you to find word and phrase patterns.

By default, the table is sorted by the strongest friends (collocates). To sort alphabetically, click on the Before, Hit, or After column headers.

Neighborhoods (KWIC) uses the same parameters as the Neighbors (Collocates) tab—by default, it will show five words before and after. To change this setting, click the Report Preferences button.

Example:

Christ is a strong friend of Jesus in The Scriptures. In the KWIC tab, you can see word-order patterns appear. Most of the time, the words co-occur as the phrase Jesus Christ, but occasionally, you will see them appear together as the phrase Christ Jesus.

For additional information on columns and statistics, view Neighbor or Collocate Statistics.

Analysis Reports / Neighborhood

Phrases (N-Grams)

The Phrases (N-grams) tab lists all repeated phrases with the search hit.

Tip: Double-click on a phrase to see it in the Neighborhoods (KWIC) tab.

Columns

Number: The row number.
Frequency: The number of times the phrase occurs.
Phrases: The n-gram.
Size: The number of words in the phrase.
Position: The search term’s position within the phrase.

Analysis Reports / Neighborhood

Families

Tip: Double-click on a phrase to see it in the Neighborhoods (KWIC) tab.

The Families tab shows relationships between friends. A family is composed of two or more friends that all co-occur with each other and the search term. This tab is intended to give insight into word usage in groups rather than just with individual words.

A cousin is any instance where the family words co-occur without the search term. If there are no cousins for a family, these words only occur with each other (indicating a strong family co-occurrence).

Columns

Number: The row number.
Families: Two or more words that all co-occur together with the search term.
Size: The number of words in the family.
Sample: The number of times the family co-occurs with the search term.
Total: The total number of times the family occurs (with or without the search term).
Cousins: The number of times that the family co-occurs without the search term.
To search for cousins, select a family and click Visit Cousins.

Analysis Reports / Neighborhood

Settings

Click on the Report Preferences icon to customize the report.

Neighborhood

By default, neighbors and neighborhoods analyze five words before and after the search word. You can set the window from 0 to 10 words in either direction.

Do Not Cross Lowest-Level Bound

The Neighborhood Report ignores paragraphing and other boundaries, so it will bring in friends from surrounding sections. To prevent this, check Do not cross lowest-level bound with neighborhoods. This option will limit collocates within the lowest-level boundary, such as paragraphs or verses.

Example:

If you have a collection of tweets, each tweet is its own entity. If you calculate friends (collocates) based off five words before or after the word she, it’s entirely possible that five words before would be a different tweet. The words in this other tweet are not actually related to the word she.

Checking Do not cross lowest-level bound means that friends will all co-occur within a single tweet. Friends will be accurate to the real co-occurrences, not coincidental co-occurrences because two tweets happen to be next to one another.

Neighbors

Choose to ignore case, diacritics, or part of speech. Use uncorrected statistics, prefer alternate spellings, or include subwords.

Friends

By default, friends (collocates) are calculated by a statistic called Mutual Information (MI). This statistic tends to show the best results, but there are other statistics you can use for calculating friends.

For additional information, view Neighbor or Collocate Statistics.

WordCruncher

Neighborhood Report

Neighborhood

Phrase Compare

Search Vocabulary

Vocabulary Dispersion

Hit Concentration

Vocabulary Frequency Distribution

Character Usage

Book Information

How to Use:

Customize

Neighbors (Collocates)

Example:

Columns

Neighborhoods (KWIC)

Example:

Phrases (N-Grams)

Columns

Families

Columns

Settings

Neighborhood

Do Not Cross Lowest-Level Bound

Example:

Neighbors

Friends