earthUI is a graphical user interface for the R earth
package, which implements Multivariate Adaptive Regression Splines
(MARS). It offers three purpose modes—general predictive modeling,
real-estate appraisal, and market-area analysis—and guides the user
through data import, model configuration, fitting, and the
interpretation of earth’s diagnostics and graphical output. This article
documents earthUI’s data-format requirements, modeling workflow, output
displays, and complete feature reference.
earthUI is a graphical user interface for the R earth
package, which implements Multivariate Adaptive Regression Splines
(MARS) (Friedman
1991). The modelling engine is Stephen Milborrow’s
earth package (Milborrow
2024), documented in detail by Milborrow (n.d.-a) and
Milborrow (n.d.-b). It
runs as a local Shiny application — there is no login, no server, and no
accounts. You launch it from R, import a dataset (CSV or Excel),
configure your model, and fit it interactively.
The application provides a complete workflow: data import, variable configuration, model fitting, diagnostic plots, variable importance, model equations, and downloadable reports in HTML, Word, or PDF format.
When you launch earthUI, a Purpose radio button at the top of the sidebar lets you choose one of three modes:
In all three modes, the core modeling engine is identical — you are always fitting an Earth (MARS) model. The purpose setting controls which additional tools and interface elements are available.
Switching purposes clears all state. When you change the purpose radio, earthUI resets to a clean default state: imported data, model results, tabs, variable configuration, and earth parameters are all cleared. You must re-import your file after switching. Previously saved settings for that file and purpose combination will then be automatically restored from your last session.
When either For Appraisal or Market Area Analysis is selected, earthUI activates several features designed for real estate analysis:
contract_date,
dom, concessions, latitude,
longitude, living_area, lot_size,
actual_age, effective_age, area,
site_dimensions, or display_only. These
designations control how the column is handled during fitting, output,
and Sales Grid generation. See Chapter 6 for the complete
reference.contract_date and an Effective Date is provided, earthUI
computes a sale_age column (days between sale date and
effective date) and substitutes it as a predictor.To use earthUI:
install.packages("earthUI") or install from source.earthUI::launch() in R, or from the command line run
Rscript -e 'earthUI::launch()'. The app opens in your web
browser on port 7878. You can also access the app directly by navigating
to http://localhost:7878 in your browser.earth() arguments.Settings are automatically persisted in your browser’s local storage and restored when you reload the same input file.
For real estate appraisal and market analysis workflows, your input data typically comes from a Multiple Listing Service (MLS) export. This chapter describes the expected file structure and the columns that earthUI can use.
earthUI accepts CSV and Excel
(.xlsx, .xls) files. On import, column names
are automatically converted to snake_case — for example,
“Living SqFt” becomes living_sqft, “Contract Date” becomes
contract_date, and “Sale Price” becomes
sale_price. This normalization ensures consistent column
references throughout the workflow. The CSV separator and decimal mark
used during import are determined by the locale settings (see Chapter 3,
“Locale & Regional Settings”).
Your data file should be a flat table with one row per property and one column per attribute. The first row of the file must contain column headers.
While earthUI works with any set of columns, the full appraisal workflow (RCA adjustments + Sales Comparison Grid) benefits from having the following columns in your MLS export:
Spreadsheet column names can be in a foreign language — the “special” names are in English so that the R program can give them special treatment. Otherwise, the given column names show up in the regression models, graphs, and (if doing appraisals) the Intermediate Sales Grid.
Not all columns are required. earthUI adapts — if a column is
missing, the corresponding feature is simply omitted. For example, if no
concessions column is designated, the Net SP row in the
Sales Grid shows Sale Price without a concessions deduction. However,
for real estate pricing models certain columns are highly recommended to
achieve acceptable fit:
Sale Age — the number of days between the
contract sale date and the effective date of the appraisal or analysis.
If multi-year sales history is being used, especially for periods over 5
years, sale_age often plays a central role in estimating
the sale price. In fact it is often so important that without it,
earthUI fails to provide any model at all.
Living Area — also goes by names such as “Living Sqft,” “GLA” (gross living area) and so on. This is another leading determinant of sale price.
Total Bath Count — the total number of full, quarter, half, and 3/4 bathrooms. For example, two full baths and one half-bath would be a value of 2.5.
Garage Bays or Garage Area — the number of garage spaces or the garage square footage.
Lot Size — the land area of the property, typically in square feet or acres.
Longitude, Latitude, and if available Area ID. Adjustments for these will be combined under a single Location adjustment in the Sales Grid.
earthUI identifies columns by their special type designation, not by their column name. You can name your columns anything you like in the MLS export — what matters is that you assign the correct special type in the Variable Configuration table (Chapter 6).
For example, your MLS might export living area as “GLA”, “Living
SqFt”, “liv_area”, or “gross_living_area”. After import (where it
becomes snake_case), you simply designate it as living_area
in the Special dropdown. earthUI will then use it for per-SF residual
calculations and Sales Grid grouping regardless of its original
name.
na.action is always set to na.fail internally,
with NA removal handled before the call to earth().2025-06-15, 06/15/2025,
June 15, 2025). earthUI auto-detects date columns when at
least 80% of values parse successfully.In Appraisal mode, row 1 must be the subject property. All remaining rows are comparable sales. The subject’s sale price can be left blank (NA) or set to any value — earthUI treats it as NA during fitting regardless.
In Market Area Analysis mode, placing the subject in row 1 is optional. If present, check “Skip first row (subject property)” to exclude it from fitting.
In General mode, there is no special row handling — all rows are treated equally.
General Purpose mode is the default when you launch earthUI. It provides the complete MARS modeling workflow for any dataset — not just real estate. You can use earthUI for scientific data, financial analysis, engineering studies, or any regression problem where you want to explore non-linear relationships and interactions between variables.
In General mode, the interface omits the real estate–specific features (special columns, sale age, coordinate rounding, RCA). The sidebar is streamlined to focus on variable selection, parameter configuration, model fitting, and export.
The sidebar is organized into numbered, collapsible sections that guide you from data import through export:
1. Import Data — File upload accepting CSV and Excel files. For Excel files with multiple sheets, a sheet selector appears. Column names are automatically converted to snake_case.
2. Project Output Folder — A text field specifying
where downloads and fit logs are saved (defaults to
~/Downloads).
3. Variable Configuration — Target variable selector (supports multiple targets), predictor table with checkboxes for Include, Factor, and Linear. See Chapter 6 for full details.
4. Earth Call Parameters — All arguments to the
earth() function: degree, penalty, nk, pruning method,
cross-validation, and more. See Chapter 7 for the complete parameter
reference.
5. Fit Earth Model — A single green button that runs the model asynchronously. See Chapter 8 for fitting details.
6. Download Output — Exports predictions, residuals, CQA scores, and per-g-function contributions as an Excel file. See Chapter 9.
7. Download Report — Generates a formatted report (HTML, Word, or PDF) saved to the output folder. See Chapter 12.
After fitting, the main panel provides nine tabs:
earthUI automatically saves your configuration to the browser’s local storage, keyed by both the input filename and the current purpose mode. When you reload the same file under the same purpose, all settings are restored: target selection, predictor checkboxes, data types, earth parameters, and response weights. This means the same file can have different configurations for General, Appraisal, and Market modes. Settings are also backed up to an SQLite database so they persist across browser sessions. A Reset to Defaults button clears all saved settings for the current purpose.
Click the moon/sun icon in the upper-right corner to toggle between light and dark themes. The theme preference is saved in local storage and persists across sessions.
earthUI supports international number, date, and CSV formatting conventions through a country-based locale system. The Country dropdown in Section 1 of the sidebar (below the file upload) selects a preset for 31 supported countries. Each preset configures:
,) for
US/UK/Japan or semicolon (;) for most of Europe, where the
comma is used as a decimal mark..) or comma
(,).Below the country selector, four override dropdowns let you change individual settings without switching countries:
When you change the country, all overrides reset to that country’s defaults. Changing an override only affects that one setting.
Click Save as my default to store your locale preferences globally. These defaults apply to all future sessions regardless of which data file you load. Per-file settings (target, predictors, parameters) are saved separately in the browser’s local storage, but locale defaults persist across all files via an SQLite database. This two-level approach is designed for organizations like audit firms that work with data from multiple countries — set your most common country as the default, then override per-file when needed.
When you select For Appraisal as the Purpose, earthUI configures itself for single-property valuation. All features described in Chapter 3 remain available; this chapter covers only the appraisal-specific additions.
In appraisal mode, row 1 of your dataset is the subject property and all remaining rows are comparable sales. Your input file must be organized accordingly (see Chapter 2). The subject’s sale price can be left blank or set to any value — earthUI automatically treats it as NA during fitting.
After importing, the Data tab splits into two sections:
Subject Property (row 1) and Comparable
Sales (rows 2+). Row 1 is always excluded from model fitting —
the notification “Skipping row 1 (subject). Fitting on N rows.” confirms
this. After fitting, the model still generates predictions for the
subject row, shown as est_<target> in the output.
In appraisal and market modes, an Effective Date
field appears in the Variable Configuration section (defaulting to
today’s date). If you designate a column as contract_date
in the Special column dropdown, earthUI computes a sale_age
column — the number of integer days between each sale’s contract date
and the effective date. This column replaces the original date column as
a predictor.
The first time you click Fit after designating a contract date,
earthUI creates the sale_age column and notifies you to
click Fit again to include it.
In appraisal and market modes, a Special dropdown appears for each predictor in the Variable Configuration table. The complete list of special types and their effects:
Only one column per special type is allowed (except
display_only). Assigning a special to a second column
automatically clears it from the first. A small blue badge appears next
to the variable name showing its assigned special type.
The Calculate RCA Adjustments & Download button (sidebar section 7, visible only in appraisal mode after fitting) computes market-derived adjustments for each comparable relative to the subject. The full RCA workflow is described in Chapter 10.
After computing RCA adjustments, the Generate Sales Grid & Download button (sidebar section 8) becomes available. The Sales Grid workflow is described in Chapter 11.
When you select Market Area Analysis as the Purpose, earthUI provides the same real estate–specific features as appraisal mode (special columns, sale age, coordinate rounding) but is oriented toward analyzing a group of properties rather than valuing a single subject.
Market Area Analysis mode is appropriate when you are:
Section 3 of the sidebar — Variable Configuration — is where you choose which columns participate in the model and how they are treated.
The Target (response) variable(s) dropdown at the
top of Section 3 lists every column in your dataset. Select one column
for a standard single-response model, or multiple columns for a
multi-response model. When multiple targets are selected, the model fits
all responses simultaneously using
earth(cbind(y1, y2, ...) ~ .).
Columns selected as targets are automatically excluded from the predictor list.
Below the target selector, a table lists every remaining column with the following fields:
| Column | Description |
|---|---|
| Variable | Column name (full name shown in tooltip if truncated). In appraisal/market modes, a blue badge shows the assigned special type. |
| Type | Data type dropdown: numeric,
integer, character, logical,
factor, Date, POSIXct |
| Inc? | Checkbox — include this column as a predictor in the model |
| Special | Dropdown (appraisal/market only) — see Special Column Types Reference below |
| Factor | Checkbox — treat this column as a categorical variable |
| Linear | Checkbox — force linear entry only (no hinge functions) |
| NAs | Count of missing values. Shown in red when more than 30% of values are missing |
A hint line above the table explains the abbreviations: “Type = column data type, Inc = include as predictor, Factor = treat as categorical, Linear = linear-only (no hinges).”
earthUI automatically detects data types on import. Numeric, integer,
logical, factor, and date columns are recognized. Character columns that
look like dates (at least 80% of values parse against common date
formats) are classified as Date.
You can override any detection by changing the Type
dropdown. When you change a column to character or
factor, the Factor checkbox is automatically checked.
Changing types affects how the column is passed to the
earth() function.
style, area_id, or grade that
represent discrete groups rather than continuous measurements. Factor
columns enter the model as indicator (dummy) variables.In appraisal and market modes, the Special dropdown
provides the following options. Each type can be assigned to at most one
column (except display_only, which allows multiple):
Date & Time Types:
contract_date — Triggers automatic
sale_age computation. The original date column is replaced
by an integer column measuring days between the sale date and the
Effective Date.listing_date — Used as a fallback for computing Days on
Market (DOM = contract date \(-\)
listing date) when no explicit dom column is
designated.dom — Identifies the Days on Market column. Displayed
in the Sales Grid’s APN row and Date of Sale row.Monetary Types:
concessions — Identifies sale concessions (seller
credits, buyer incentives, etc.). Used in the Sales Grid to compute Net
Sale Price: Net SP = Sale Price \(-\)
Concessions.Size & Location Types:
latitude — Values are automatically rounded to 3
decimal places to prevent overfitting. Used for Haversine proximity
calculations (distance from subject to each comp) and grouped in the
Location row of the Sales Grid.longitude — Same rounding treatment as latitude. Used
for proximity and the Location group.area — Typically a neighborhood or area identifier.
Grouped with latitude and longitude in the “Loc: Long | Lat | Area” row
of the Sales Grid.living_area — Enables per-square-foot residual
calculations (residual_sf and cqa_sf) in the
download output.lot_size — Grouped in the “Site Size | Dimensions” row
of the Sales Grid.site_dimensions — Grouped with lot size in the Sales
Grid (e.g., “75x120”).Age Types:
actual_age — Grouped in the “Actual Age | Effective
Age” row of the Sales Grid.effective_age — Grouped with actual age in the Sales
Grid.Display Types:
display_only — The column is included in Excel exports
but excluded from model fitting entirely. Use this for address fields,
MLS numbers, parcel IDs, or other reference data that should not be a
predictor. Multiple columns can have this designation.Selecting more than one target variable fits a multi-response Earth model. When multiple targets are selected:
varmod.method) are disabled (not
supported for multi-response)Section 4 of the sidebar — Earth Call Parameters —
provides access to all arguments accepted by the earth()
function. Each parameter has a blue help icon (?) with
a tooltip explanation. Parameters are organized into collapsible
subsections.
| Parameter | Default | Description |
|---|---|---|
| subset | (empty) | Row filter expression. See “Subset Filtering” below. |
| weights | NULL | Column selector for case (row) weights. Only numeric columns are listed. |
| wp | NULL | Response weights for multi-target models. Button opens a dialog with one numeric input per target (default 1.0 each). |
| keepxy | off | Retain x, y, subset, and weights in the model object. |
| trace | 0 | Trace level for fitting output (0 through 5). |
| glm | none | Optional GLM family:
gaussian, binomial, or
poisson. |
| degree | 1 | Maximum interaction order. Setting degree \(\geq\) 2 auto-enables cross-validation and reveals the Allowed Interactions matrix. |
| penalty | 2 | GCV penalty per knot. Higher values produce simpler models. |
| Parameter | Default | Description |
|---|---|---|
| nk | auto | Maximum terms before pruning. Default: min(200, max(20, 2\(\times\)predictors)) + 1. |
| thresh | 0.001 | Forward-step threshold. Smaller values allow more terms. |
| minspan | 0 (auto) | Minimum span between knots. Negative values set the maximum knots per predictor. |
| endspan | 0 (auto) | End span — minimum distance from a knot to the edge of the data. |
| newvar.penalty | 0 | Penalty for introducing a new variable (encourages reuse of existing predictors). |
| fast.k | 20 | Number of parent terms to consider in the fast MARS algorithm. |
| fast.beta | 1 | Controls the fast MARS aging factor. |
When degree is set to 2 or higher, an interaction matrix appears below the basic parameters. This is an upper-triangular grid of checkboxes, one for each predictor pair. A checked box means the two predictors are allowed to interact; unchecking it forbids that specific interaction.
Clicking a predictor name (row or column label) toggles all interactions for that variable. Allow All and Clear All checkboxes at the top provide bulk control. The matrix uses sticky headers so column and row labels remain visible when scrolling.
An info alert reminds you: “Interaction terms increase the risk of overfitting. Cross-validation has been enabled (10-fold).”
| Parameter | Default | Description |
|---|---|---|
| pmethod | backward | Pruning method: backward,
none, exhaustive, forward,
seqrep, or cv. |
| nprune | NULL | Maximum terms after pruning. Leave empty for automatic selection. |
| Parameter | Default | Description |
|---|---|---|
| nfold | 10 | Number of cross-validation folds. Set to 0 to disable CV. |
| ncross | 20 | Number of cross-validation repetitions. |
| stratify | on | Stratify CV samples so each fold has a similar response distribution. |
When degree \(\geq\) 2, cross-validation is automatically enabled (nfold set to 10) to help guard against overfitting from interactions.
The subset text input accepts an R expression that
filters which rows are used for model fitting. You can type an
expression directly (e.g.,
sale_age < 365 & area_id == 460) or use the
Build filter… button to construct one visually.
Build Filter Dialog — Click “Build filter…” to open
a guided dialog. Each condition row has a column dropdown, operator
(<, >, <=,
>=, ==, !=), and a value input
that adapts to the column type: numeric input for numbers, date picker
for dates, and dropdown of unique values for character/factor columns.
Conditions are joined with AND (&) or OR
(|) connectors. A preview at the bottom shows the
expression and how many rows match. Click Apply to
insert the expression into the text input.
Date columns must use as.Date("...") or
as.POSIXct("...") wrappers in manual expressions. The Build
Filter dialog handles this automatically.
Subset filtering is non-destructive — excluded rows remain in the dataset and receive predictions in the Excel export.
earthUI displays a recommended value below key parameters in Section 4. These recommendations update reactively based on the number of fitting rows (\(n\)) and selected predictors (\(p\)). The formulas are derived from Friedman’s MARS paper, earth’s internal algorithms, and empirical testing.
nk (max terms before pruning):
\[\text{nk} = \min\!\bigl(100,\; \max(21,\; 2p + 1,\; \lfloor n/10 \rfloor)\bigr)\]
| \(n\) | \(p{=}5\) | \(p{=}10\) |
|---|---|---|
| 30 | 21 | 21 |
| 50 | 21 | 21 |
| 100 | 21 | 21 |
| 200 | 21 | 21 |
| 500 | 50 | 50 |
| 1000 | 100 | 100 |
| 1500 | 100 | 100 |
Earth’s default, \(\min(200, \max(20, 2p)) + 1\), does not account for dataset size and can be too small for large datasets, constraining the forward pass before it explores all predictors adequately. The \(\lfloor n/10 \rfloor\) term allows approximately one term per 10 observations — a standard rule of thumb for avoiding overparameterization. The cap of 100 prevents diminishing returns.
minspan (minimum observations between knots):
\[\text{minspan} = \min\!\bigl(16,\; \lfloor 5 + n/50 \rfloor\bigr)\]
endspan (minimum observations from data boundaries):
\[\text{endspan} = \min\!\bigl(16,\; \lfloor 5 + n/28 \rfloor\bigr)\]
| \(n\) | minspan | endspan |
|---|---|---|
| 30 | 5 | 6 |
| 50 | 6 | 6 |
| 100 | 7 | 8 |
| 200 | 9 | 12 |
| 300 | 11 | 15 |
| 500 | 15 | 16 |
| 1500 | 16 | 16 |
Earth’s auto-calculated minspan uses Friedman’s equation 43: \(\lfloor(-\ln(-\ln 0.95) + \ln(p \cdot n)) / (2.5 \ln 2)\rfloor\), which scales as \(\ln(np)\) and grows too slowly for large datasets. At \(n{=}1500\) it yields only \({\approx}7\), giving roughly 245 candidate knot locations per continuous predictor — too many, allowing the forward pass to fit noise. The recommended formula targets approximately 100 candidates per predictor for large \(n\), while deferring to earth’s auto-calculation for small \(n\) (where it is well-tested). Endspan is set slightly larger than minspan to provide additional boundary protection, which helps when data has thin tails (common in real estate).
penalty (GCV penalty per knot):
\[\text{penalty} = \begin{cases} 3 & \text{if degree} > 1 \\ 2 & \text{if degree} = 1 \end{cases}\]
This is earth’s own default — 2 for additive models, 3 for interaction models (the higher penalty compensates for the larger search space).
newvar.penalty (penalty for introducing a new predictor):
Recommended: 0.1 when predictors are correlated (e.g., living area with bedroom count, bathroom count, lot size). This biases the forward pass toward reusing predictors already in the model rather than introducing correlated alternatives. The mechanism: each new predictor’s RSS improvement is multiplied by \(1/(1 + \text{penalty})\). With \(\text{newvar.penalty} = 0.1\), a new variable must improve RSS by at least 10% more than another knot on an existing variable. This produces simpler models with fewer predictors without affecting final coefficient estimates (the penalty is removed after selection). Leave at 0 if predictors are not correlated.
pmethod: Recommended: backward. The
default GCV-based backward pruning is deterministic — the same data
always produces the same model, with no dependence on random seeds. This
is important for reproducibility, especially in appraisal work. The
cv method uses cross-validation for pruning which
introduces seed dependence.
nprune: Recommended: leave empty (NULL). Let GCV select the optimal number of terms. Setting nprune imposes a hard cap that overrides GCV’s judgment.
nfold (CV folds):
\[\text{nfold} = \min\!\bigl(15,\; \max(10,\; \lfloor n/100 \rfloor)\bigr)\]
| \(n\) | nfold |
|---|---|
| 30 | 10 |
| 100 | 10 |
| 500 | 10 |
| 1000 | 10 |
| 1500 | 15 |
With pmethod = "backward", cross-validation does not
affect the model — it only computes the diagnostic CVR and provides
residuals for the variance model. The floor of 10 and cap of 15 reflect
this: enough folds for a stable diagnostic without unnecessary
computation.
ncross (CV repetitions):
\[\text{ncross} = \max\!\bigl(3,\; \lceil 100/n \rceil\bigr)\]
| \(n\) | ncross | total residuals |
|---|---|---|
| 30 | 4 | 120 |
| 50 | 3 | 150 |
| 100 | 3 | 300 |
| 1500 | 3 | 4500 |
The variance model (varmod.method = "lm") fits on the
cross-validation residuals. The formula targets at least 100 total
residuals (\(n \times \text{ncross}\))
for a stable variance estimate, with a floor of 3 (required by earth
when a variance model is enabled).
varmod.method: Recommended: lm.
This fits prediction intervals using a linear regression of absolute CV
residuals on the predicted response. It requires
nfold > 0 and ncross >= 3.
After every successful fit, earthUI writes
<filename>_earth_output_<timestamp>.txt to the
output folder. This file contains the model terms, summary statistics
(R, GRSq, CVRSq), variance model details, and trace log. One file is
created per fit, providing a cumulative record for comparing parameter
configurations.
A radio button at the top of Section 4 controls which defaults are loaded:
Click Save current as default to store the current parameter configuration as your personal defaults.
Section 5 of the sidebar contains a single green button: Fit Earth Model. This button is always visible (not inside a collapsible section). Before fitting, earthUI validates your configuration — a target must be selected and at least one predictor must be included. In appraisal/market modes, latitude and longitude columns are rounded, and sale age is computed from the effective date and contract date (if designated).
When you click Fit, a dark modal overlay appears with:
earth() function: dataset
info, forward pass progress, cross-validation folds, and completion
timeearthUI fits models asynchronously using callr::r_bg(),
which runs the computation in a separate R process. This keeps the
application responsive during long-running fits. A 300ms polling
observer reads output from the background process and streams it to the
trace log.
When fitting completes, a “Done in X.Xs” message appears and a close button (X) is added to the modal. The modal does not auto-dismiss — you close it when you are ready to review results.
On success, a green checkmark is appended to the Fit button. If an error occurs, the error message is displayed in the trace log and a notification appears.
Every fit (success or failure) writes a log file to your output folder:
<datafile>_earth_log_<YYYYMMDD_HHMMSS>.txt~/Downloads)After fitting, download an Excel file with predictions and
diagnostics. This output is used in Step 7 (RCA) to assign a CQA
(Condition/Quality/Appeal) rating to the subject property. The output is
sorted by residual_sf and cqa_sf to help you
assess where the subject falls in the ranking. If the model is good
quality, then the properties should be ranked from lowest appealing to
most appealing based on residual features that did not go into the
regression. The middle value should be approximately 0, the lower half
negative values and the upper half positive values. You should find the
worst quality homes, or “fixers” near the bottom of the ranking and the
nicest homes at the top. There will usually be exceptions for anomalies
such as foreclosures, short sales, probate (inheritance related) sales,
and quick sales needed for job change or other reasons. Investigation of
anomalies usually turns up a pertinent reason for the price anomaly.
The button label adapts to the purpose mode:
The filename format is
<datafile>_modified_<YYYYMMDD_HHMMSS>.xlsx.
For each target variable, the following columns are appended:
| Column | Description |
|---|---|
est_<target> |
Model prediction,
e.g. est_sale_price (1 decimal place) |
residual |
Actual minus predicted (1 dp) |
cqa |
Comparative Quality Analysis score (2 dp, range 0–10) |
residual_sf |
Residual divided by living area (if designated, 1 dp) |
cqa_sf |
CQA calculated from ranking via residual_sf (2 dp) |
<variable>_contribution |
Per-g-function contribution value (1 dp, one column per g-function) |
basis |
Intercept value contribution, same for all properties (1 dp) |
calc_residual |
Actual minus (basis + all contributions) — verification column (1 dp) |
For multi-target models, a _<i> suffix is added to
distinguish columns for each target.
The ranking columns are placed at the leftmost
position in the output file, in this order:
residual_sf, cqa_sf, residual,
cqa. This makes it easy to scan the sorted list and
evaluate where the subject falls in the CQA distribution.
These columns have Excel number formatting applied:
| Column | Format | Example |
|---|---|---|
residual_sf |
Numeric, 2 decimal places | 12.34 |
cqa_sf |
Numeric, 2 decimal places | 7.25 |
residual |
Numeric, 0 decimal places | 15,200 |
cqa |
Numeric, 2 decimal places | 6.80 |
The CQA (Comparative Quality Analysis) score ranks each row’s residual against all other residuals. For a given row, the CQA is the percentage of rows with a smaller signed residual, multiplied by 10. This produces a 0–10 scale where:
When a living_area column is designated,
cqa_sf provides the same ranking based on per-square-foot
residuals.
In appraisal and market modes, comparable rows are sorted by
residual_sf descending (or residual if no
living area is designated). The subject row (row 1) remains in position
1 and is not sorted. In general mode, rows are exported in their
original order.
This sorting, combined with the leftmost ranking columns, allows the appraiser to quickly scan the comparables from most over-predicted to most under-predicted, and assess where the subject’s assigned CQA score falls in that distribution.
Factor levels in the prediction data are aligned with the training
data before calling predict(). Rows with unseen factor
levels will produce NA predictions.
On every successful fit with degree \(\leq\) 2, earthUI automatically saves the
full result object as an .rds file to the Project Output
Folder. The filename follows the pattern
<datafile>_earthUI_result_<YYYYMMDD_HHMMSS>.rds.
This file can be loaded by mgcvUI (a companion Shiny app for GAM
modeling) using readRDS(). mgcvUI uses the earth model’s
knot locations and basis functions as starting points for GAM smooth
terms, enabling a seamless transition from MARS to GAM modeling.
Models with degree > 2 are skipped because mgcvUI
only supports pairwise interactions. A manual Export for
mgcvUI button is also available in the sidebar for on-demand
export.
The same .rds file saved for mgcvUI can also be imported
into glmnetUI (a companion Shiny app for elastic net
regression). In glmnetUI, navigate to Section 2: Import from
earthUI and use the Browse button to select the
.rds file.
glmnetUI uses the standard earth::model.matrix()
approach: the earth model’s basis functions (hinges, interactions, and
linear terms) become the columns of glmnet’s design matrix. glmnet then
performs regularized regression on these basis columns, selecting and
shrinking them via lasso/elastic net. This combines earth’s adaptive
basis construction with glmnet’s regularization.
.rds result file is saved automatically to the
Project Output Folder..rds file.The Calculate RCA Adjustments & Download button appears in sidebar section 7, visible only in appraisal mode after a model has been fitted. RCA (Reconciliation by Comparable Adjustment) produces market-derived, per-comparable adjustments relative to the subject property.
Clicking the button opens a small modal dialog with:
living_area column is designated). If
you choose “CQA per SF,” then based on the CQA score you assign the
subject, its residual score will be its living area times the
residual_sf that matches the given CQA_SF score.When you click Generate, earthUI interpolates the subject’s residual from the comparable CQA/residual pairs:
stats::approx()) maps your
entered CQA value to a residualThe RCA Excel file
(<datafile>_adjusted_<YYYYMMDD_HHMMSS>.xlsx)
includes all intermediate output columns plus:
| Column | Description |
|---|---|
subject_value |
Model prediction + interpolated residual (row 1 only) |
subject_cqa |
The CQA score you entered (row 1 only) |
<variable>_adjustment |
Subject contribution minus comp contribution (per g-function) |
residual_adjustment |
Subject residual minus comp residual |
net_adjustments |
Sum of all adjustments (contribution + residual) |
gross_adjustments |
Sum of absolute values of all adjustments |
adjusted_sale_price |
Comp sale price + net adjustments |
The adjustment columns tell you, for each comparable, how much the
model attributes the difference to each variable.
adjusted_sale_price is the comparable’s sale price after
applying all model-derived adjustments — a set of adjusted sale prices
that should cluster around the subject’s estimated value.
The Sales Comparison Grid is an Excel workbook generated in appraisal mode (sidebar section 8) after computing RCA adjustments. It presents the subject property alongside selected comparables in a structured grid format, with Excel formulas that automatically compute adjustments and an adjusted sale price for each comparable.
The grid is designed for the appraiser’s workfile — it combines the
regression-derived adjustments from the Earth model with editable cells
where the appraiser can allocate the CQA residual to specific property
features. The output filename is
SalesGrid_<YYYYMMDD_HHMMSS>.xlsx.
Clicking “Generate Sales Grid & Download” opens a modal dialog where you select which comparables to include:
After confirming your selection, the selected comps are sorted by gross adjustment percentage ascending, so the most similar comparables appear on Sheet 1.
Each sheet has a 20-column layout accommodating the subject property plus 3 comparables. The columns for each entity are:
The rows from top to bottom are:
Row 5 shows three values for each comparable:
concessions (0 if not designated)The subject column shows “N/A” for sale price (since it is unknown) and any concessions value if available.
When certain special types are designated and those variables appear in the model, earthUI creates grouped rows that combine related variables:
Loc: Long | Lat | Area — Appears when any of
longitude, latitude, or area are
in the model. Shows factual values for each constituent variable, a
combined Value Contribution (sum of the individual VCs), and for comps,
an adjustment (subject combined VC \(-\) comp combined VC). Styled with light
blue background on VC cells.
Site Size | Dimensions — Appears when
lot_size or site_dimensions are in the model.
Same structure as the Location group.
Actual Age | Effective Age — Appears when
actual_age or effective_age are in the model.
Same structure.
Variables consumed by grouped rows are excluded from the individual model variable rows below, preventing double-counting in the adjustment totals.
Below the grouped rows (or directly below BASE VALUE if no grouped rows), one row per remaining model predictor shows:
The CQA|Residual row shows each property’s CQA score and contains a formula for the remaining residual — the portion of the residual not yet allocated to specific features. The formula is:
\[\text{Remaining Residual} = \text{Total Residual} - \sum(\text{Residual Feature VCs below})\]
Below this are residual feature rows — 5 named rows (Location, View/Appeal, Condition, Quality, Other) plus 6 blank rows. These are input cells where the appraiser enters value contributions for features not captured by the model. As the appraiser fills in values, the Remaining Residual formula automatically decreases.
For each comp, the adjustment column contains a formula: subject feature VC \(-\) comp feature VC.
The Adjusted Sale Price row contains Excel formulas:
Each sheet is protected to prevent accidental modification of formulas and data. The protection locks:
The only unlocked (editable) cells are the residual feature Value Contribution inputs — the cells under the CQA|Residual row where the appraiser enters breakdowns. These are styled with a light yellow background to indicate they are editable.
After fitting a model, the Download Report section (sidebar section 7, 8, or 9 depending on the purpose mode) lets you generate a comprehensive formatted report.
Three formats are available via the format dropdown:
.html file with KaTeX math
rendering, embedded images, and a table of contents. Uses the Flatly
Bootstrap theme and Roboto Condensed font..docx file suitable for
editing and distribution. Includes a table of contents, page numbers,
and uses a custom reference template for consistent styling.Reports are rendered via Quarto and saved directly to the Project
Output Folder as
<datafile>_report_<YYYYMMDD_HHMMSS>.<format>.
Rendering runs in the background — a modal dialog shows elapsed time and
Quarto progress while the app remains responsive. A notification
confirms the file location on completion.
All operations (model fitting, data downloads, RCA calculations,
sales grid generation, and report rendering) are logged with start/end
timestamps and elapsed times to
<datafile>_earthui_log.txt in the output folder, for
troubleshooting and performance monitoring.
Every report includes the following sections:
earth::summary(), including pruning pass detailsFor multi-response models, sections 4–8 and 10 are repeated for each target variable.
earthUI includes a demo MLS dataset for exploring the appraisal workflow. Load it programmatically with:
demo_file <- system.file("extdata", "Appraisal_1.csv", package = "earthUI")
df <- import_data(demo_file)Or import it directly through the Shiny app file upload.
The file contains 1,502 residential sales (plus 1 subject property in row 1) from a simulated MLS export. The data represents single-family home sales in a multi-area market with a range of property sizes, ages, and locations.
This is not real data, but is based on a realistic neighborhood in Northern California. All identification information has been altered or removed.