This vignette introduces CHNOSZ, an R package for thermodynamic calculations relevant to geochemistry and geochemical biology. CHNOSZ provides functions and a thermodynamic database for calculating properties of reactions involving minerals, aqueous species, and gases across a range of temperatures and pressures.
This vignette was compiled on 2026-06-02 with CHNOSZ 2.2.0-33.
After installing CHNOSZ from CRAN, load the package:
## CHNOSZ version 2.2.0-33 (2026-06-02)
## reset: creating "thermo" object
## OBIGT: loading default database with 1833 aqueous, 3499 total species
This makes the thermodynamic database and functions available in your
R session. To restore default settings at any point, use
reset().
CHNOSZ is made up of interrelated functions and supporting data. The major components of the package are shown in the flow diagram. Ellipses represent data sources and rectangles represent functions. Bold arrows and functions show the most common workflows (described in italic legends). Dashed arrows represent internal flows of data.
CHNOSZ offers several primary functions for thermodynamic analysis:
info(): Search the thermodynamic databasesubcrt(): Calculate thermodynamic properties of species
and reactionsaffinity(): Calculate affinities of formation
reactionsequilibrate(): Calculate equilibrium chemical
activitiesdiagram(): Plot the resultsbasis(): Set basis species and their chemical
activitiesspecies(): Set species of interest and their
activitiesreset(): Reset database and system settings to
defaultsThe info() function provides access to the OBIGT
thermodynamic database.
## info.character: found CH4(aq); also available in gas, liq
## [1] 732
Searching by formula returns the aqueous species if it is available.
Use a species name or add the state to get a particular physical state -
aq, cr, gas, or
liq:
## info.character: found methane(gas); also available in liq
## [1] 2701
## [1] 2701
Use info() recursively to return thermodynamic
parameters:
## info.character: found CH4(aq); also available in gas, liq
## check.EOS: calculated Cp° of CH4(aq) differs by -2.61 cal K-1 mol-1 from database value
## name abbrv formula state ref1 ref2 date model E_units G H S Cp V a1 a2 a3 a4 c1 c2 omega Z
## 732 CH4 <NA> CH4 aq PS01 <NA> 2000-10-04 HKF cal -8140 -20930 21 60.23 36 1.769 -1530 -67.88 114700 40.87 64500 -40000 0
You can access fuzzy search functionality by using partial names:
## info.approx: 'ribose+' is ambiguous; has approximate matches to 8 species:
## [1] "ribose" "deoxyribose" "ribose-5-phosphate" "ribose-5-phosphate-1" "ribose-5-phosphate-2" "ribose" "deoxyribose" ## [8] "ribose-5-phosphate"
## [1] NA
The subcrt() function [loosely named after SUPCRT; Johnson et al. (1992)]
calculates standard thermodynamic properties.
The default conditions are 0.01–350 °C along the Psat curve (defined here as the greater of 1 bar or the vapor-liquid saturation pressure for H2O):
## info.character: found CH4(aq); also available in gas, liq
## subcrt: 1 species at 15 values of T (ºC) and P (bar) (wet) [energy units: J]
## $species
## name formula state ispecies model
## 732 CH4 CH4 aq 732 HKF
##
## $out
## $out$CH4
## T P rho logK G H S V Cp
## 1 0.01 1.000000 0.9998289 6.146120 -32141.64 -94362.48 64.02537 28.43371 320.2540
## 2 25.00 1.000000 0.9970614 5.966661 -34057.76 -87571.12 87.86400 36.32929 241.0868
## 3 50.00 1.000000 0.9880295 5.898249 -36490.28 -81887.10 106.18347 40.21802 217.6954
## 4 75.00 1.000000 0.9748643 5.903292 -39346.90 -76565.80 122.04828 42.65169 209.4693
## 5 100.00 1.013220 0.9583926 5.960475 -42580.84 -71362.35 136.48265 44.45100 207.5653
## 6 125.00 2.320144 0.9390726 6.055409 -46157.27 -66152.32 149.98108 45.99448 209.5788
## 7 150.00 4.757169 0.9170577 6.179107 -50057.59 -60843.92 162.88237 47.51660 215.2640
## 8 175.00 8.918049 0.8923427 6.325065 -54267.31 -55334.98 175.48219 49.23207 225.6772
## 9 200.00 15.536499 0.8647434 6.488877 -58778.47 -49481.61 188.11556 51.41530 243.4995
## 10 225.00 25.478603 0.8338733 6.667868 -63591.20 -43048.65 201.24975 54.51539 274.5815
## 11 250.00 39.736493 0.7990719 6.861076 -68717.66 -35603.67 215.66282 59.40754 332.5458
## 12 275.00 59.431251 0.7592362 7.069725 -74191.11 -26241.56 232.89165 68.08973 454.1118
## 13 300.00 85.837843 0.7124075 7.298795 -80088.35 -12702.80 256.64410 86.10141 764.2170
## 14 325.00 120.457572 0.6545772 7.562218 -86598.26 12491.49 298.92128 134.18673 1893.9587
## 15 350.00 165.211289 0.5746875 7.907228 -94333.67 93338.02 429.08080 355.39612 11384.8458
You can customize the T-P grid by passing the appropriate arguments:
# Custom T, P grid for water in supercritical region
subcrt("H2O", T = c(400, 500), P = c(250, 300))## info.character: found H2O(liq) [water]; also available in cr, gas
## subcrt: 1 species at 2 values of T (ºC) and P (bar) (wet) [energy units: J]
## $species
## name formula state ispecies model
## 1 water H2O liq 1 water.SUPCRT92
##
## $out
## $out$water
## T P rho logK G H S V Cp
## 1 400 250 0.1666279 21.47188 -276714.5 -241279.6 155.8919 108.1164 239.06019
## 2 500 300 0.1152589 19.74903 -292320.6 -232176.0 167.6886 156.3021 77.68281
Unit settings for subcrt() are handled by
T.units(), P.units(), and
E.units():
## changed energy units to cal
## info.character: found CH4(aq); also available in gas, liq
## subcrt: 1 species at 25 ºC and 1 bar (wet) [energy units: cal]
## $species
## name formula state ispecies model
## 732 CH4 CH4 aq 732 HKF
##
## $out
## $out$CH4
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 5.966661 -8140 -20930 21 36.32929 57.62112
## reset: resetting "thermo" object
## OBIGT: loading default database with 1833 aqueous, 3499 total species
Define reactions with species names or formulas, states (optional), and coefficients:
NOTE: Reaction coefficients are negative for reactants and positive for products.
## subcrt: 2 species at 25 ºC and 1 bar (wet) [energy units: J]
## $reaction
## coeff name formula state ispecies model
## 2683 -1 carbon dioxide CO2 gas 2683 CGL
## 430 1 CO2 CO2 aq 430 HKF
##
## $out
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 -1.468942 8384.736 -20288.22 -96.16924 32.63308 159.1761
Reactions can be automatically balanced using basis species:
subcrt() for calculating standard thermodynamic
properties.## C H O Z ispecies logact state
## CO2 1 0 2 0 430 0 aq
## H2O 0 2 1 0 1 0 liq
## H+ 0 1 0 1 3 0 aq
## e- 0 0 0 -1 2 0 aq
## subcrt: 2 species at 25 ºC and 1 bar (wet) [energy units: J]
## subcrt: reaction is not balanced; it is missing this composition:
## C H O Z ## 1 -1 2 -1
## subcrt: adding missing composition from basis definition and restarting...
## subcrt: 4 species at 25 ºC and 1 bar (wet) [energy units: J]
## $reaction
## coeff name formula state ispecies model
## 732 1 CH4 CH4 aq 732 HKF
## 922 -1 acetate C2H3O2- aq 922 HKF
## 430 1 CO2 CO2 aq 430 HKF
## 3 -1 H+ H+ aq 3 HKF
##
## $out
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 8.884021 -50710.08 -15355.28 119.244 28.86502 410.4372
species() for calculating affinities and
making diagrams.## C H O Z ispecies logact state
## CO2 1 0 2 0 430 0 aq
## H2O 0 2 1 0 1 0 liq
## H+ 0 1 0 1 3 0 aq
## e- 0 0 0 -1 2 0 aq
## CO2 H2O H+ e- ispecies logact state name
## 1 1 -2 8 8 732 -3 aq CH4
## 2 2 -2 7 8 922 -3 aq acetate
There are some keywords (e.g. CHNOS+,
CHNOSe and QEC) for loading predefined sets of
basis species. See the help page of basis() for more
information.
The affinity() function calculates chemical affinities
over ranges of T, P, and activities:
# Setup the C-H-N-O-S basis system with electron
basis("CHNOSe")
# Define aqueous sulfur species
species(c("H2S", "HS-", "S3-", "SO2", "HSO4-", "SO4-2"))
# Calculate affinities on an Eh-pH grid
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Create an Eh-pH (Pourbaix) diagram
diagram(a, col = 4, col.names = 4, italic = TRUE)
# Add legend
TC <- convert(a$T, "C")
legend <- c(describe.property("T", TC), quote(italic(P) == "1 bar"))
legend("topright", legend = legend, bty = "n")Eh–pH (Pourbaix) diagram for S
NOTE: diagram() automatically adds shading to regions of
water instability with respect to O2 or H2.
For more sophisticated diagrams involving speciation of basis
species, use the mosaic() function:
# Create a mosaic diagram for Cu-S-Cl-H2O system
basis(c("Cu", "H2S", "Cl-", "H2O", "H+", "e-"))
basis("H2S", -6)
basis("Cl-", -1)
species(c("CuCl", "CuCl2-", "CuCl3-2", "CuCl+", "CuCl2", "CuCl3-", "CuCl4-2"))
species(c("chalcocite", "tenorite", "cuprite", "copper"), add = TRUE)
bases <- c("H2S", "HS-", "S3-", "SO2", "HSO4-", "SO4-2")
m <- mosaic(bases, pH = c(0, 12), Eh = c(-1, 1), T = 200)
a <- m$A.species
diagram(a, lwd = 2, bold = species()$state == "cr")
diagram(m$A.bases, add = TRUE, col = 4, col.names = 4, italic = TRUE)
water.lines(m$A.species, lty = 3, col = 8)
# Add legend
legend <- lTP(convert(a$T, "C"), a$P)
legend("topright", legend = legend, bty = "n")Mosaic diagram showing effect of aqueous S speciation on the relative stabilities of Cu minerals and aqueous species
Here we’ve added dotted lines to help visualize the water stability limits.
Calculate equilibrium distributions of species:
# Carbonate speciation vs pH
basis(c("CO2", "H2O", "H+", "e-"))
species(c("CO2", "HCO3-", "CO3-2"))
# 25 degrees C
a <- affinity(pH = c(0, 14))
e <- equilibrate(a)
diagram(e, alpha = TRUE)
# 100 degrees C
a <- affinity(pH = c(4, 12), T = 100)
e <- equilibrate(a)
diagram(e, alpha = TRUE, add = TRUE, col = 2, names = NA)
# Add legend
legend <- as.expression(list(lT(25), lT(100)))
legend("left", legend = legend, lty = 1, col = c(1, 2))Carbonate speciation as a function of pH and temperature
Calculate solubility of minerals or gases:
# Corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = "green4", lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")Solubility of corundum (green line) and equilibrium concentrations of aqueous species (black lines)
Incorporate non-ideal behavior using the extended Debye–Hückel
equation by setting the ionic strength parameter IS:
# Corundum solubility again
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
# Calculate with ionic strength of 0 and 1 molal
s0 <- solubility(iaq, pH = c(2, 10))
s1 <- solubility(iaq, pH = c(2, 10), IS = 1)
diagram(s1, col = 4, lwd = 3, ylim = c(-8, -2))
diagram(s0, col = "green4", lwd = 2, add = TRUE)
legend("topleft", legend = c(1, 0), lwd = c(3, 2), col = c(4, "green4"), title = "I (mol/kg)", bty = "n")
legend("topright", c("25 °C", "1 bar"), bty = "n")Solubility of corundum dependent on ionic strength
Functions that have the IS parameter are
subcrt(), affinity(), mosaic(),
and solubility(). When a value for IS is
specified, inputs and outputs labeled as activities are actually
interpreted or reported as molalities.
The CHNOSZ package incorporates data and methods from various
sources. Use thermo.refs() to view citation information for
data sources:
# Return data frame with references for one or more species
thermo.refs(info("CH4", c("aq", "gas")))
# View all references in a browser
thermo.refs()For citing CHNOSZ itself, see “How should CHNOSZ be cited?” in the FAQ.
The idea for creating stability diagrams in CHNOSZ came from Prof. Harold Helgeson’s geochemistry course at UC Berkeley. There, the students constructed diagrams that were “balanced on” a metal. For instance, in a reaction between two minerals balanced on aluminum, Al is only present in the minerals on either side of the reaction and not as an ion.
The line-based method, used for making diagrams by hand, looks at reactions between pairs of species (let’s call them transformation reactions), then draws a line between stability fields where the non-standard Gibbs energy of reaction is zero. The grid-based method, used in CHNOSZ, looks at reactions to compose individual species from the basis species (let’s call them formation reactions), then selects the most stable species according to their affinity values.
Affinity is just the opposite of non-standard Gibbs energy of reaction. “Standard Gibbs energy of reaction” and “Gibbs energy of reaction” – which are different concepts – have unfortunately similar names except for an optional overall or non-standard in front of the latter (the word choice varies among authors, e.g. Amend and LaRowe, 2019; Solel et al., 2019). “Non-standard Gibbs energy of reaction” doesn’t lend itself to a short, unambiguous function name, which is why its opposite, affinity, is used in CHNOSZ.
In the line-based method, transformation reactions are said to be balanced on a metal. The grid-based method implements this balancing constraint by dividing the affinities of formation reactions by the stoichiometric coefficients of a basis species. CHNOSZ uses these normalized affinities for making relative stability diagrams, a technique referred to as the maximum affinity method (Dick, 2019). The line- and grid-based methods both have the same limitation: every species considered in the relative stability calculation must have non-zero stoichiometry of the metal the transformation reactions are balanced on (or equivalently of the conserved basis species that has that metal).
A useful technique in geochemical modeling is to construct “composite diagrams” (Garrels and Christ, 1965), where stability fields for minerals and predominance fields for aqueous species are both displayed on the same plot. Because mineral activities are assumed to be unity while aqueous species activities can vary, the choice of aqueous species activity defines a concentration-dependent boundary between mineral and aqueous stability fields on composite diagrams. These solubility boundaries between minerals and aqueous species are referred to as either “equisolubility” (Pourbaix, 1974) or “isosolubility” (Helgeson, 1964; Garrels and Christ, 1965) lines.
CHNOSZ provides two distinct approaches for calculating isosolubility lines:
First approach (quasisolubility contours): This method loads multiple aqueous species with identical activities, then constructs the predominance diagram using the maximum affinity method. However, isosolubility lines calculated this way represent the reaction between the most stable mineral and a single aqueous species at each point on the diagram. These can be called “quasisolubility contours” because they provide only a partial view of mineral solubility.
Second approach (total solubility): This method sums the activities of all candidate aqueous species in equilibrium with the most stable mineral, providing isosolubility lines that represent the contribution from all relevant aqueous species.
The first approach is implemented in CHNOSZ by setting the activities
of formed aqueous species, then creating a predominance diagram with the
maximum affinity method using diagram(). See
below for an example of this method. In essence, quasisolubility
contours on predominance diagrams are equivalent to previously described
“solubility limits” on activity diagrams (Osseo-Asare and Brown, 1979). The term
“quasisolubility” emphasizes that these contours provide only
first-order estimates of solubility, considering just one aqueous
species at a time.
CHNOSZ offers a second approach that improves upon the maximum
affinity method by accounting for contributions from multiple aqueous
species to total solubility. This approach is implemented in the
solubility() function (see
below).
The following example demonstrates the difference between these two approaches:
The total solubility contour (green) at log mFe = -6 lies inside the quasisolubility contour (boundary between tan and blue areas), showing that the latter underestimates solubility; the largest contributions to total solubility are from Fe+2 and FeCl+ at low pH and FeO2- and HFeO2- at high pH
This comparison illustrates how quasisolubility boundaries on
classical predominance diagrams systematically underestimate mineral
solubility. The discrepancy between quasisolubility and total solubility
is minimized when one aqueous species strongly predominates over all
others. However, in systems where multiple aqueous species contribute
significantly to solubility, this mismatch becomes more pronounced. For
such systems, solubility() provides more accurate estimates
of isosolubility contours.
It is important to note that neither approach satisfies overall mass balance constraints in complex multicomponent systems. For applications requiring rigorous mass-balance solutions, more sophisticated techniques not currently implemented in CHNOSZ should be considered (Pelton et al., 2018; De Lucia and Kühn, 2021).
Having seen basic examples of the main features of CHNOSZ, here are some ideas for building more complex calculations and diagrams.
Labeling diagrams is an important but often difficult step for creating publication-ready figures.
CHNOSZ provides the axis.label() and
expr.species() functions to create formatted axis labels
and chemical formulas. Let’s revisit the CO2 dissolution
example seen earlier and add two other gases (carbon monoxide and
methane). This plot is similar to Figure 18 of Manning et al. (2013).
T <- seq(0, 350, 10)
CO2_logK <- subcrt(c("CO2", "CO2"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CO_logK <- subcrt(c("CO", "CO"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CH4_logK <- subcrt(c("CH4", "CH4"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
logK <- data.frame(T, CO2_logK, CO_logK, CH4_logK)
matplot(logK[, 1], logK[, -1], type = "l", col = 1, lty = 1,
xlab = axis.label("T"), ylab = axis.label("logK"))
text(80, -1.7, expr.species("CO2"))
text(240, -2.37, expr.species("CO"))
text(300, -2.57, expr.species("CH4"))
# Add legend
legend <- describe.property("P", "Psat")
legend("topright", legend = legend, bty = "n")Equilibrium constants of dissolution reactions
CHNOSZ has some other helper functions for labeling diagrams:
describe.reaction() to format chemical reactions from
the output of subcrt()describe.property() to format property values as
equations (e.g. “T = 25 °C”)describe.basis() to format the activities of basis
specieslT(), lP(), lTP(), and
related functions for more compact representations of conditions
(e.g. “25 °C, 1 bar”)There is no single best approach to formatting text for legends, and sometimes it’s easiest to use basic R functions:
plotmath() for general
information on formatting mathematical expressions in R.bquote() is useful for putting values of variables into
expressions.as.expression() on the combined output from other functions
to produce complex legends for the legend() function.Want to find all the Al complexes in the database? List them by
calling retrieve() with the main element optionally
followed by the ligand elements and state.
# List aqueous Al species in the default database
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Print the first few rows and columns
info(iaq)[1:3, 1:5]## name abbrv formula state ref1
## 479 Al+3 <NA> Al+3 aq TS01
## 480 Al(OH)4- <NA> Al(OH)4- aq TS01
## 481 AlOH+2 <NA> AlOH+2 aq TS01
## $species
## name formula state ispecies model
## 479 Al+3 Al+3 aq 479 HKF
## 480 Al(OH)4- Al(OH)4- aq 480 HKF
## 481 AlOH+2 AlOH+2 aq 481 HKF
##
## $out
## $out$`Al+3`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 85.40242 -487477.8 -538769.6 -339.7534 -45.33933 -119.3444
##
## $out$`Al(OH)4-`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 228.7613 -1305772 -1503075 103.5456 46.28513 96.5402
##
## $out$`AlOH+2`
## T P rho logK G H S V Cp
## 1 25 1 0.9970614 121.9904 -696322.2 -769864.3 -181.1254 -20.62274 -37.44297
The result of retrieve() can also be used to add species
to a diagram; see the example in #3 below.
Perhaps you’d like to use data from older databases that have been
superseded by later updates. The OBIGT vignette briefly summarizes the
superseded data for SUPCRT92
and SLOP98 (Shock et
al., 1998). Use add.OBIGT() to load these
old data entries.
add.OBIGT("SLOP98")
iaq_all <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Use difference between sets to get the added species
info(setdiff(iaq_all, iaq))## name abbrv formula state ref1 ref2 date model E_units G H S Cp V a1 a2 a3 a4 c1 c2 omega Z
## 3740 AlO2- AlO2- AlO2-1 aq SSWS97.4 <NA> 1997-11-07 HKF cal -198693 -222125 -7.22 -11.9 10.0 0.37221 399.54 -1.5879 -29441 15.2391 -54585 174180 -1
## 3741 AlO+ AlO+ AlO+1 aq SSWS97.4 <NA> 1997-11-13 HKF cal -158188 -170900 -27.00 -30.0 0.6 0.21705 -248.11 6.7241 -26763 -2.5983 -91455 95700 1
## 3742 HAlO2 HAlO2(AQ) HAlO2 aq SSWS97.4 <NA> 1997-11-13 HKF cal -207700 -227500 5.00 -50.0 13.0 0.35338 84.85 5.4132 -28140 -23.4129 -132195 -3000 0
NOTE: Anhydrous species are commonly used for the revised Helgeson-Kirkham-Flowers (HKF) model. For example, Shock et al. (1997) reported the properties of AlO2−, which is available in the optional SLOP98 database. This species is the anhydrous form of Al(OH)4−, which is present in the default database (see output above) using parameters from a more recent compilation for Al species (Tagirov and Schott, 2001). Because they are effectively the same species, only one form of this species is listed in the default database. Unless you have a specific reason to compare them, redundant species should not be considered together in an equilibrium calculation.
OBIGT() restores the default database without affecting
other settings. reset() restores the default database and
all other settings in CHNOSZ. These functions are useful for both
interactive use and scripts that compare different versions of data or
plots for different systems or conditions.
Let’s put items #1–4 together to remake the corundum solubility plot
using only species available in SLOP98. To do this, we use
add.OBIGT() followed by retrieve() to gather
the species indices for all Al species, then take only those species
sourced from Shock et al. (1997).
# Add superseded species from SLOP98
add.OBIGT("SLOP98")
# List all aqueous Al species
iaq <- retrieve("Al", state = "aq")
# Keep only species from Shock et al. (1997)
iaq <- iaq[grepl("SSWS97", info(iaq)$ref1)]
# Plot corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
## Alternatively, we could use the species names
#s <- solubility(names(iaq), pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = "green4", lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")Corundum solubility with species from SLOP98
A common question is: what are the basis species used for? The basis species define the compositional variables of a system. The composition of any species that can possibly be formed in that system can be represented by a linear combination of the basis species. The basis species may also be referred to as thermodynamic components, although a strict definition of the latter does not include charged species.
CHNOSZ requires that the number of basis species is equal to the number of different elements in the basis species (plus charge, if present). If you were studying the relative stability of F- and OH-complexes with Al, you might be tempted to try this basis definition:
## Error in `put.basis()`: ## ! the number of basis species is less than the number of elements and charge
According to the message, we don’t have enough basis species for the number of elements. Since the composition of hydroxide is water minus a proton (i.e., OH− = H2O − H+), we could try this instead:
## Error in `put.basis()`: ## ! the number of basis species is less than the number of elements and charge
That’s still not enough species. As is often the case, we need to include a basis species representing oxidation-reduction (redox) reactions, even if there are no redox reactions between the formed species. Here are two possible basis definitions that do not give an error.
In order to make a diagram with stability fields for different
species, CHNOSZ needs to know about the activities of all the species in
the reaction. The activities of the basis species start with constant
values as shown in the output above (logact column).
Selected basis species can be assigned to plot axes (with a range of
values) in affinity().
NOTE: logact is the logarithm of activity for
aqueous species, solids, and liquids, or logarithm of fugacity
for gases. All logarithms in CHNOSZ are common logarithms (R function:
log10()) unless indicated otherwise.
How about the formed species in the system - that is, the species
whose stability fields we want to visualize? We both list the species
and set their activities using species(). The function
defaults to activities of 10-3 (logact of -3)
for aqueous species and unit activity (logact = 0) for
minerals, gases, and liquids. Let’s change this to activities of
10-6 for the formed species.
basis(c("Al+3", "F-", "H+", "H2O", "e-"))
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Check that the data are from the same source
stopifnot(all(info(iaq)$ref1 == "TS01"))
species(iaq, -6)NOTE: The value of logact passed to
species() defines a quasisolubility contour, which is less
than the total solubility to the extent that more than one aqueous
species is abundant (see
above).
There are two places where you might see add = TRUE.
First, in species() to add species not already in the list.
Without add = TRUE, any existing species are discarded.
Second, in diagram() to add to an existing plot.
Let’s put items #5–7 together to make a Pourbaix (Eh–pH) diagram for Al with two quasisolubility contours.
basis(c("Mn+2", "H+", "H2O", "e-"))
icr <- retrieve("Mn", ligands = c("H", "O"), state = "cr")
iaq <- retrieve("Mn", ligands = c("H", "O"), state = "aq")
# First layer, logact(aq) = -3
species(icr)
species(iaq, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
# Use names = NA to avoid plotting labels twice
diagram(a, lty = 2, names = NA)
# Second layer, logact(aq) = -4
species(icr)
species(iaq, -4, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
d <- diagram(a, bold = species()$state == "cr", add = TRUE)
# Add water stability limits
water.lines(d, lty = 3, col = 8)
# Add legends
legend("topright", legend = c(lT(100), lP("Psat")), bty = "n")
title = as.expression(quote(log~italic(a)*"Mn(aq)"))
legend("bottomleft", legend = c(-3, -4), lty = c(2, 1), title = title, bty = "n")Pourbaix diagram for Mn with two quasisolubility contours
The shaded areas in the diagram represent water instability regions
and are automatically added by diagram(). We use
water.lines() here to plot the water stability limits with
dotted lines.
After defining the basis species and formed species (and their
constant activities), you have some choices about what variables to put
on the plot, the grid resolution, and values for a few other variables.
affinity() accepts one or more named arguments that specify
ranges of variables as c(min, max) using the default grid
resolution of 256, or ranges and a custom grid resolution as
c(min, max, res). The number of such arguments is the
dimensionality of the final plot. The grid resolution (res)
defaults to 256 and can be different for each variable. The names of the
variables can be the formulas of any of the basis species, or
T, P, or IS for temperature,
pressure, or ionic strength. These last three default to 25 °C,
Psat (the greater of 1 bar or the vapor-liquid
saturation pressure for H2O), and 0 mol/kg.
I often start with a low grid resolution to quickly iterate a calculation, then switch to a higher resolution when I’m satisfied with the result.
Sodium chloride (NaCl) solutions are commonly used reference points
for geochemical models. The NaCl() function provides a
quick-and-dirty way to estimate ionic strength and activity of chloride
(Cl−) for a given total amount of NaCl added to 1 kg of
H2O. These values can then be used in setting up a
calculation that involves these variables.
This function does not use either the basis() or
species() definitions. The following example runs a
calculation for 0.8 mol/kg NaCl and given T, P, and
pH. See demo('sum_S') for a more fully worked-out example
that uses this code (based on a diagram in Skirrow and Walshe, 2002).
T <- 300
P <- 1000
pH <- 5
m_NaCl = 0.8
NaCl <- NaCl(m_NaCl = m_NaCl, T = T, P = P, pH = pH)
print(paste("mol NaCl added to 1 kg H2O:", m_NaCl))## [1] "mol NaCl added to 1 kg H2O: 0.8"
## [1] "Ionic strength (mol/kg): 0.607065915465928"
## [1] "Chloride concentration (mol/kg): 0.607061333179561"
It’s possible to to make a series of quasisolubility contours by
setting activities of aqueous species (see the example for Mn above). A more refined
solution that accounts for multiple aqueous species is to use
solubility() to visualize total solubility contours. The
basic idea is to first load the mineral(s) containing a single metal as
the formed species(). Then, list the aqueous species with
that metal as the first argument to solubility(). The
remaining arguments to the function define the plot variables, just as
in affinity() and mosaic().
Let’s put items #8–10 together to make a set of diagrams for a single metal. The example here uses Fe; try changing it to Cu, Zn, Pb, Au, or something else!
CHNOSZ generates warning messages about being above the Cp limits for various iron oxyhydroxides. For the present calculation, the warnings are probably harmless because the predicted set of stable minerals (pyrite, pyrrhotite, magnetite, and hematite) is consistent with published diagrams.
NOTE: If you see warning messages like this, it’s a good idea not to ignore them, but to consider whether you might be pushing the extrapolations of the Cp equation too far.
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 390 K for the Cp equation for lepidocrocite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 390 K for the Cp equation for lepidocrocite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 400 K for the Cp equation for hydronium jarosite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 375 K for the Cp equation for goethite(cr)
## Warning in subcrt(species = c(1965L, 65L, 29L, 2691L, 1L, 3L, 1865L, 1873L, : above T limit of 390 K for the Cp equation for lepidocrocite(cr)
Stability diagram for minerals; predominance diagram for aqueous species; composite diagram with quasisolubility contour; diagram with total solubility contours in units of log m
NOTE: As explained above, total solubility is higher than quasisolubility.
There are further examples of Eh–pH composite diagrams (denoting
quasisolubility) overlaid with total solubility contours in
demo("Pourbaix").
The convert() function offers several unit conversions.
It implements reciprocal conversion between pairs of units, so only the
destination unit needs to be specified. Some simple uses are to convert
temperature, pressure, or energy values:
# Convert 100 degrees C to value in Kelvin
convert(100, "K")
# Convert 273.15 K to value in degrees C
convert(273.15, "C")
# Convert 1 bar to value in MPa
convert(1, "MPa")
# Convert 100 MPa to value in bar
convert(100, "bar")
# Convert 1 cal/mol to value in J/mol
convert(1, "J")
# Convert 10 J/mol to value in cal/mol
convert(10, "cal")Another use of convert() is to convert the output of
solubility() from log activity to ppm, ppb, log ppm, or
log ppb. The following code continues from the example above:
sppm <- convert(s, "ppm")
levels <- c(1e-6, 1e-3, 1e0, 1e3)
diagram(sppm, levels = levels, col = "green4")Solubility in units of ppm
Specify 4 or more values for one or more variables (each variable
should have the same number of values, or be set to a constant) to
activate the transect mode of affinity(). The
transect mode allows defining an arbitrary path in
multidimensional space.
Here’s a simple example:
basis("CHNOSe")
species(c("NO3-", "NO2-", "NH4+", "NH3"))
a <- affinity(pH = c(6, 8, 6, 8), Eh = c(0.5, 0.5, -0.5, -0.5))## $pH
## [1] 6 8 6 8
##
## $Eh
## [1] 0.5 0.5 -0.5 -0.5
## $`16`
## [1] 10.71165 28.71165 -124.52393 -106.52393
##
## $`17`
## [1] 9.573951 23.573951 -91.852732 -77.852732
##
## $`18`
## [1] 2.2409947 0.2409947 2.2409947 0.2409947
##
## $`64`
## [1] -1 -1 -1 -1
NOTE: affinity() returns dimensionless values of
affinity (i.e., A/2.303RT).
Below we’ll see how to convert these to energetic units.
diagram() looks for the first basis species that has
non-zero coefficients in each of the formed species. This is called the
conserved basis species in the documentation. The affinity values are
then divided by the coefficients of the conserved basis species to give
normalized affinities. This is how “balancing on a metal” is
implemented.
Let’s put items #11–13 together to calculate affinities of organic synthesis reactions in mixed seawater and hydrothermal fluid from the Rainbow vent field using speciation results from Shock and Canovas (2010):
Affinities of organic synthesis reactions per mole of C, H2, or formed species
Although affinity() uses all of the variables in the
transect, diagram() labels the x-axis with only the first
variable (temperature). We obtain three plots:
balance = 1). This just shows
the affinity of each reaction as given (that is, per mole of formed
species), which is how the results were presented by Shock and Canovas (2010).In normal use, subcrt() returns standard molal
properties (G, H, S, Cp, and V) for species in their standard states,
defined as unit activity or fugacity. Two deviations from this default
setting are possible: non-standard properties for specified
activity, and adjusted properties for activity
coefficients.
First let’s look at how adjusted properties depend on activity coefficients. This example uses a particular nonideality model based on Alberty (2003):
# Set the nonideality model
old_ni <- nonideal("Alberty")
# Calculate standard and adjusted Gibbs energy at 25 °C
species <- c("MgH2ATP", "MgHATP-", "MgATP-2")
subcrt(species, T = 25, IS = c(0, 0.25), property = "G")$out## $MgH2ATP
## T P G loggam IS
## 1 25 1 -3289461 0.000000000 0.00
## 2 25 1 -3289472 -0.001951595 0.25
##
## $`MgHATP-`
## T P G loggam IS
## 1 25 1 -3267679 0.0000000 0.00
## 2 25 1 -3268489 -0.1418484 0.25
##
## $`MgATP-2`
## T P G loggam IS
## 1 25 1 -3236780 0.0000000 0.00
## 2 25 1 -3240019 -0.5673936 0.25
Notice how logarithms of the activity coefficients (loggam) are more negative for the higher-charged species. The activity coefficients have a stabilizing effect: adjusted Gibbs energies (at I > 0) are less than the standard Gibbs energies (at I = 0).
Now let’s change the activities to get non-standard properties.
species <- c("Mg+2", "ATP-4", "MgATP-2")
coeffs <- c(-1, -1, 1)
T <- c(25, 50, 100)
# Drop some columns for more compact output
idrop <- c(2:4, 6:9)
# Unit activity: affinity is opposite of standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(0, 0, 0))$out[, -idrop]## T G logQ A
## 1 25 -33748.14 0 33748.14
## 2 50 -38662.41 0 38662.41
## 3 100 -51468.20 0 51468.20
# Non-unit activity: affinity is opposite of non-standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(-5, -4, -3))$out[, -idrop]## T G logQ A
## 1 25 -33748.14 6 -499.9146
## 2 50 -38662.41 6 1542.6389
## 3 100 -51468.20 6 8604.9959
NOTE:
logK, G, H,
S, V, and Cp columns in the
output of subcrt() are always standard or
adjusted thermodynamic properties, not non-standard ones.logQ and A are added if the
logact argument is provided.logact argument specifies the activities of species
in the same order as the first argument.A in the output of subcrt() has the same
units as G (J/mol by default); this differs from
affinity(), which outputs dimensionless values
(A/2.303RT).The first call above specifies unit activities of all the species in the reaction.
A in the output) are the opposite
of the standard Gibbs energy (G in the
output).The second call specifies non-unit activities of the species.
Above we used subcrt() to calculate non-standard Gibbs
energy, but doing it with affinity() can be more convenient
for making diagrams. This example plots non-standard Gibbs energies for
hydrogenotrophic methanogenesis, acetate oxidation, and acetate
oxidation, and is based on Fig. 4b of Mayumi et
al. (2013). We combine some of the
features described above with new ones for swapping basis species and
removing formed species:
swap.basis(): Changes one species for another in an
existing basis definitionspecies() with delete = TRUE: Removes one
or more species from the set of formed speciesdescribe.reaction(): Formats reactions for plots using
the output of subcrt()names and srt arguments of
diagram(): Use supplied reaction labels with rotationNon-standard Gibbs energies of organic reactions as a function of CO2 fugacity
After running the code to make the diagram, we can print the formation reactions for each of the species. This shows how the two acetate reactions (acetate oxidation and acetoclastic methanogenesis) are implemented by swapping H2 and CH4 in the basis species.
## CO2 H2 H2O H+ ispecies logact state name
## 1 1 4 -2 0 2701 -0.18 gas methane
## 2 2 4 -2 -1 922 -3.40 aq acetate
## CO2 CH4 H2O H+ ispecies logact state name
## 1 1 1 0 -1 922 -3.4 aq acetate
NOTE: This example uses balance = 1 in the call to
diagram() to prevent normalizating the reactions by a
balancing constraint (i.e., normalization by number of C) in order to
reproduce the calculations of Mayumi et al. (2013). In most other cases (especially for
making relative stability diagrams), this argument should not be
used.
Sometimes it’s useful to make further computations on the results of
a diagram() call. For example, a system might dominated by
a few stable species, but you’d rather visualize the relative
stabilities of less stable (i.e., metastable) species. Here we do this
for all the aqueous S species in the OBIGT database, accessed using
retrieve(). We use plot.it = FALSE to suppress
the first plot (which would look like the Eh–pH diagram above for
S), but save the output with d <- diagram() to
access the identified stable species in d$predominant.
After removing these stable species from the system, we recalculate
affinities for the remaining metastable species and make a diagram for
them.
basis("CHNOSe")
iaq <- retrieve("S", c("H", "O"), state = "aq")
species(iaq)
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Save results but don't plot the first diagram
d <- diagram(a, plot.it = FALSE)
# Remove the stable species
istable <- unique(as.numeric(d$predominant))
species(istable, delete = TRUE)
# Make a diagram for the metastable species
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
d <- diagram(a, col = 4, col.names = 4, italic = TRUE)Eh–pH diagram for metastable S species
Other possibly useful parts of the diagram() output
are:
plotvals: Affinity for each species, after normalizing
by the balancing constraintpredominant.values: Normalized affinity for the
predominant species at each point on the diagramnames: Names used for labeling lines or fields
(includes formatting for chemical formulas)namesx, namesy: Locations for labelslinesout: x, y coordinates of boundary lines between
stability fieldsNOTE: If diagram() was passed the output of
equilibrate() or solubility(), then its output
contains logarithms of activities instead of dimensionless
affinities.
makeup() is used internally by some functions in CHNOSZ
but is also available for the user. It counts the elements (and charge,
if present) in a chemical formula(s) formatted as a character object. If
supplied with a species index in the OBIGT database, it uses their
formulas. Setting sum = TRUE in the function call instructs
makeup() to sum the elemental compositions.
The data frame returned by makeup() can be used in
as.chemical.formula() to generate the character string for
a formula.
## Cu Co S
## 0.92 2.07 4.00
# Sum the elements of formulas supplied as character strings
summed_elements <- makeup(c("CO2", "CH4"), sum = TRUE)
# Use the result to write a new chemical formula
as.chemical.formula(summed_elements)## [1] "C2H4O2"
All the data used by CHNOSZ - from the thermodynamic data in OBIGT to
the basis species defined by the user - are stored in an object named
thermo in the package environment. Sometimes it’s useful to
peek inside CHNOSZ’s memory banks, or more rarely to directly modify
them. The thermo() function returns the current value of
this object and can also update it. Here we display the first level
structure of thermo, then show the structure of the
database (thermo()$OBIGT) in more detail.
## List of 13
## $ opt :List of 18
## $ element :'data.frame': 134 obs. of 6 variables:
## $ OBIGT :'data.frame': 3499 obs. of 22 variables:
## $ refs :'data.frame': 343 obs. of 6 variables:
## $ Berman :'data.frame': 109 obs. of 30 variables:
## $ buffer :'data.frame': 39 obs. of 4 variables:
## $ protein :'data.frame': 505 obs. of 25 variables:
## $ groups :'data.frame': 5 obs. of 22 variables:
## $ stoich : num [1:3499, 1:88] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## $ Bdot_acirc: Named num [1:59] 2.5 2.5 2.5 2.5 2.5 3 3 3 3 3 ...
## ..- attr(*, "names")= chr [1:59] "Rb+" "Cs+" "NH4+" "Tl+" ...
## $ opar :List of 66
## $ basis :'data.frame': 6 obs. of 9 variables:
## $ species :'data.frame': 21 obs. of 10 variables:
## 'data.frame': 3499 obs. of 22 variables:
## $ name : chr "water" "e-" "H+" "Li+" ...
## $ abbrv : chr NA NA "H+" "Li+" ...
## $ formula : chr "H2O" "(Z-1)" "H+" "Li+" ...
## $ state : chr "liq" "aq" "aq" "aq" ...
## $ ref1 : chr "JOH92" "OBIGT06" "Den81" "SH88" ...
## $ ref2 : chr "HGK84" NA NA NA ...
## $ date : chr "2006-10-25" "2006-10-28" "1997-11-06" "1997-11-06" ...
## $ model : chr "H2O" "HKF" "HKF" "HKF" ...
## $ E_units : chr "cal" "cal" "cal" "cal" ...
## $ G : num NA 0 0 -69933 -62591 ...
## $ H : num NA 0 0 -66552 -57433 ...
## $ S : num NA 15.6 0 2.7 14 ...
## $ Cp : num NA 0 0 14.2 9.06 1.98 -3 -6.29 -5.34 -7.53 ...
## $ V : num NA 0 0 -0.87 -1.11 ...
## $ a1.a : num NA 0 0 -0.0237 1.839 ...
## $ a2.b : num NA 0 0 -0.069 -2.285 ...
## $ a3.c : num NA 0 0 11.58 3.26 ...
## $ a4.d : num NA 0 0 -2.78 -2.73 ...
## $ c1.e : num NA 0 0 19.2 18.2 ...
## $ c2.f : num NA 0 0 -0.24 -2.98 ...
## $ omega.lambda: num NA 0 0 0.486 0.331 ...
## $ z.T : num NA 0 0 1 1 1 1 1 2 2 ...
Call thermo() with a named argument to assign a value.
In this case we change the temperature units for
subcrt():
## $`opt$T.units`
## [1] "C"
## $`opt$T.units`
## [1] "K"
See the help page for the Berman() function for a
practical example of adding thermodynamic data with the Berman (1988) model,
which are stored outside of the OBIGT database.
The default model for activity coefficients uses the extended
Debye–Hückel equation with parameters for NaCl-dominated solutions from
Helgeson et al. (1981). The species-species parameters are
charge and (for the default Bdot model) ion-size parameters
used in the HCh program (Shvarov and Bastrakov, 1999). By contrast,
the bgamma model uses an extended term parameter that is
derived from data of Helgeson (1969), Helgeson et
al. (1981), and high-P extrapolations of
Manning et al. (2013). The Alberty model uses
parameters listed in Chapter 3 of Alberty (2003), which are applicable to relatively
low temperatures. Choose from these models with
nonideal().
NOTE: By default, H+ is assumed to have unit activity
coefficient for any ionic strength. Enable calculations of activity
coefficients for H+ by running
thermo("opt$ideal.H" = FALSE).
Invoke calculations of activity coefficients by setting the
IS argument in subcrt(),
affinity(), mosaic(), or
solubility(). This has the effect of transforming activity
to molality in the CHNOSZ workflow. A set of calculations demonstrating
this tranformation is in test-logmolality.R in the package
test directory. Key variables affected by this transformation are listed
here:
In subcrt(), the logact argument stands
for log molality of aqueous species and calculated values of
G are the adjusted Gibbs energy at specified ionic strength
(this is written as ΔG°(I) by
Alberty, 1996).
In affinity(), the following stand for log molality
of aqueous species:
logact set by basis()logact set by species()In solubility() and equilibrate(), the
following stand for log molality of aqueous species:
loga.balance (logarithm of total molality of the
conserved basis species)loga.equil (logarithm of molality of each
species).Because function arguments have static names, we’re stuck with
logact even when it means log molality. However,
diagram() automatically changes labels from
“log a” to “log m” when run on the output of
affinity() with a non-NULL value for IS.
Buffers are assemblages of one or more species whose presence constrains the chemical activities (or fugacities) of basis species in a thermodynamic system. Buffers play a critical role in geochemical modeling by providing realistic constraints on system variables like pH and oxidation state. This section explores the implementation and application of buffers in CHNOSZ.
The mod.buffer() function defines or modifies buffers by
specifying the species that constitute the buffer and their
activities:
Use basis() to associate one or more basis species with
a buffer (all the elements in the buffer must be present in the basis
species). Then use affinity(return.buffer = TRUE) to
calculate and retrieve the buffered activities or fugacities of basis
species:
# Specify basis species
basis(c("FeCHNOS"))
# Associate O2 with the PPM buffer
basis("O2", "PPM")
# Calculate and retrieve the buffered fugacity of O2
a <- affinity(T = 200, P = 2000, return.buffer = TRUE)
# Access the buffered O2 fugacity (log fO2)
log_fO2 <- a$O2 # -44.28
# Calculate buffered fugacities across temperature range
a <- affinity(T = c(200, 400, 11), P = 2000, return.buffer = TRUE)Some buffers constrain multiple basis species simultaneously:
# Setup basis species
basis(c("FeS2", "H2S", "O2", "H2O"), c(0, -10, -50, 0))
# Associate both H2S and O2 with the PPM buffer
basis(c("H2S", "O2"), c("PPM", "PPM"))
# Retrieve values for both buffered species
buffer_values <- affinity(T = 300, P = 2000, return.buffer = TRUE)
log_aH2S <- buffer_values$H2S # -2.57
log_fO2 <- buffer_values$O2 # -37.16The diagram() function with the type
argument can solve for and display activities of basis species. This
example reproduces part of Fig. 6 in Schulte and
Shock (1995):
# Setup a system in terms of gases and liquid water
basis(c("hydrogen", "carbon dioxide", "nitrogen", "water"))
# Use 10 bar of CO2 and 1 bar of other gases (default)
basis("CO2", 1)
# Load aqueous species with given log activity
species(c("HCN", "formaldehyde"), c(-6, -6))
# Calculate affinities to form aqueous species from basis species
a <- affinity(T = c(0, 350), P = 300)
# Create diagram showing H2 activity where affinity = 0
d <- diagram(a, type = "H2", lty = c(2, 3))
legend("bottomright", c("HCN", "formaldehyde"), lty = c(2, 3))Equilibrium log H2 fugacity for 10-6 activity of HCN or formaldehyde with water, 1 bar of N2 and 10 bar of CO2
See demo("buffer") for a fully worked-out example based
on the figure in Schulte and Shock (1995).
NOTE: This feature works independently from buffers defined in
thermo()$buffer, but produces equivalent results for
certain systems; see test-diagram.R in the package test
directory.
Redox buffers like QFM, HM, and PPM can be used as inputs for subsequent calculations:
# Setup system for gold solubility calculation
basis(c("Au", "Fe", "H2S", "H2O", "oxygen", "H+"))
# Apply PPM buffer for fO2 and H2S
basis("O2", "PPM")
basis("H2S", "PPM")
# Calculate gold solubility using the buffered values
species("Au")
iaq <- info(c("Au(HS)2-", "AuHS", "AuOH"))
s <- solubility(iaq, pH = c(2, 8), T = 300, P = 1000)
# Create solubility diagram
diagram(s, ylim = c(-10, -5), col = "green4", lwd = 2)
col <- c("#ED4037", "#F58645", "#0F9DE2") # Au(HS)2-, AuHS, AuOH
diagram(s, type = "loga.equil", add = TRUE, col = col)Gold solubility at 300 °C with PPM buffer for fO2 and aH2S
Buffers can be used in transect calculations to model changes across
gradients. An interesting application is to add a delta to values
obtained with return.buffer = TRUE:
# Set basis species
basis(c("Fe", "SiO2", "CO3-2", "H2O", "oxygen", "H+"))
# Calculate log fO2 in QFM buffer across temperature range
basis("O2", "QFM")
T <- seq(600, 1000, 100)
buf <- affinity(T = T, P = 5000, return.buffer = TRUE)
# Use buffered fO2 values in downstream calculations
species(c("CH4", "CO2", "HCO3-", "CO3-2"))
# Set values of pH for transect
pH <- seq(3.8, 4.3, length.out = length(T))
# Adding 2 log units below QFM buffer
a <- affinity(T = T, O2 = buf$O2 - 2, pH = pH, P = 5000)Neutral pH at various temperatures and pressures can be determined from the dissociation constant of water:
Mineral assemblages like K-feldspar–muscovite–quartz (KMQ) can buffer pH:
# Define the KMQ buffer
mod.buffer("KMQ", c("quartz", "muscovite", "K-feldspar"), "cr", 0)
# Setup the system
basis(c("Al2O3", "quartz", "K+", "H2O", "oxygen", "H+"))
# Set K+ molality for KCl solution
basis("K+", log10(0.5))
# Associate H+ with the KMQ buffer
basis("H+", "KMQ")
# Calculate buffered pH
pH_KMQ <- -affinity(T = 300, P = 1000, return.buffer = TRUE)$`H+`Mineral buffers constrain both pH and redox state in geological systems, which in turn control metal solubility and transport in ore-forming fluids. This example illustrates how different buffers affect gold solubility in hydrothermal systems.
Effects of different buffers on gold solubility
The diagrams show:
Note the following limitation:
mosaic() calculations currently aren’t supported for
basis species that are associated with a buffer.basis() to use in the mosaic calculation.See also:
demo("gold") for chloride and ionic strength effects
(plots show molality instead of activity)Proteins in CHNOSZ are treated differently from other chemical species. Instead of direct thermodynamic data, CHNOSZ uses amino acid group additivity to calculate the thermodynamic properties of proteins. This approach requires knowledge of the amino acid composition of each protein.
In CHNOSZ, protein identifiers have a specific format that combines
the protein name and organism with an underscore separator, modeled
after UniProt names (e.g., LYSC_CHICK for chicken lysozyme
C). This naming convention uniquely identifies each protein in the
database.
CHNOSZ has a small built-in database of amino acid compositions for selected proteins, but you can expand this by adding proteins from FASTA or CSV files.
# Reading amino acid compositions from a CSV file
file <- system.file("extdata/protein/POLG.csv", package = "CHNOSZ")
aa <- read.csv(file)
# Add the proteins to CHNOSZ and get their indices
iprotein <- add.protein(aa)
# For FASTA files, use the canprot package
fasta_file <- system.file("extdata/protein/rubisco.fasta", package = "CHNOSZ")
aa <- canprot::read_fasta(fasta_file)
iprotein <- add.protein(aa)The add.protein() function adds the amino acid
compositions to the database and returns the row indices of the added
proteins.
Once proteins are added to the database, you can calculate various properties such as length, formula, and thermodynamic properties.
# Get protein length (number of amino acids)
pl <- protein.length("LYSC_CHICK")
# Get chemical formula
pf <- protein.formula("LYSC_CHICK")
# Display results
list(length = pl, formula = pf)## $length
## [1] 129
##
## $formula
## C H N O S
## LYSC_CHICK 613 959 193 185 10
The average oxidation state of carbon, calculated with
ZC(), provides insight into the redox state of protein
sequences:
CHNOSZ provides several functions for calculating thermodynamic properties of proteins.
Calculate standard thermodynamic properties of non-ionized proteins
using subcrt():
## T P rho logK G H S V Cp
## 1 0.01 1.000000 0.9998289 3465.966 -18125553 -44827095 16093.36 10049.21 18448.59
## 2 25.00 1.000000 0.9970614 3250.225 -18552315 -44238615 18149.61 10420.95 26842.53
## 3 50.00 1.000000 0.9880295 3076.728 -19034574 -43528036 20436.77 10600.23 29597.54
## 4 75.00 1.000000 0.9748643 2936.705 -19573864 -42770559 22693.98 10708.15 30863.62
## 5 100.00 1.013220 0.9583926 2823.190 -20168495 -41989264 24860.88 10782.93 31582.69
## 6 125.00 2.320144 0.9390726 2730.687 -20814620 -41191982 26925.13 10840.94 32071.19
For more accurate calculations, especially in biological systems,
protein ionization must be considered (Dick et al., 2006). CHNOSZ handles this
through the ionize.aa() function, which allows specifying
the temperature, pressure and pH conditions:
To include proteins in a chemical system, first define the basis species, then add proteins to the system:
The affinity() function accounts for ionization effects
when calculating affinities of formation reactions:
# Calculate affinity as a function of pH
basis("CHNOS+")
species(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
a1 <- affinity(pH = c(0, 14))For performance optimization, use protein indices directly with the
iprotein argument to affinity(). This doesn’t
require proteins to be added with species():
species(delete = TRUE)
ip <- pinfo(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
# Set logarithm of activity with loga.protein
a2 <- affinity(pH = c(0, 14), iprotein = ip, loga.protein = -3)
# Check that both methods produce equivalent results
for(i in 1:3) stopifnot(all.equal(a1$values[[i]], a2$values[[i]]))Calculate the relative abundance of proteins in metastable
equilibrium using equilibrate(). This example uses averaged
amino acid compositions of protein sequences in metagenomes from a
temperature and chemical gradient in a hot spring (Dick and Shock,
2011):
# Calculate equilibrium distribution as a function of Eh
basis("CHNOSe")
proteins <- paste("overall", paste0("bison", c("N", "S", "R", "Q", "P")), sep = "_")
ip <- pinfo(proteins)
a <- affinity(Eh = c(-0.35, -0.15), iprotein = ip)
# Normalize by protein length to get residue-equivalent distribution
# Set loga.balance to distribute proteins with total activity of residues equal to 1
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
col <- c("darkred", "red", "darkgray", "blue", "darkblue")
diagram(e, ylim = c(-4, -2.5), col = col, lwd = 2)
legend("bottomleft", c("High-T reducing", NA, NA, NA, "Low-T oxidizing"),
lty = 1:5, col = col, title = "Source environment", inset = c(0.1, 0))Metastable equilibrium of proteins from hot-spring metagenomes
The normalize = TRUE option is important for proteins
because their large size leads to extreme separation of activities in
metastable equilibrium. Normalizing by protein length (calculating
per-residue equivalents) compresses the range of relative abundances to
be more experimentally realistic.
Use unitize() to scale abundances of proteins so that
number of residues sums to unity:
# Sample protein data from YeastGFP study
protein <- c("YDL195W", "YHR098C", "YLR208W", "YNL049C", "YPL085W")
abundance <- c(1840, 12200, 21400, 1720, 358)
# Find protein indices in CHNOSZ database
ip <- match(protein, thermo()$protein$protein)
# Get protein lengths
pl <- protein.length(ip)
# Scale protein abundance so total abundance of residues is unity
scaled_abundance <- 10^unitize(log10(abundance), pl)
# Check that sum for residues is unity
stopifnot(all.equal(sum(scaled_abundance * pl), 1))Unit total activity of residues is set by
equilibrate(loga.balance = 0), allowing comparison between
experimental and predicted abundances:
basis("CHNOSe")
# Make a guess for Eh
basis("Eh", -0.5)
a <- affinity(iprotein = ip)
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
# Check for unit total abundance of residues
predicted_abundance <- 10^unlist(e$loga.equil)
stopifnot(all.equal(sum(predicted_abundance * pl), 1))
plot(log10(scaled_abundance), log10(predicted_abundance), pch = 19)
# Show 1:1 line
lims <- range(par("usr"))
lines(c(lims[1], lims[2]), c(lims[1], lims[2]), lty = 2)
MAE <- mean(abs(log10(scaled_abundance) - log10(predicted_abundance)))
legend("topleft", paste("MAE =", round(MAE, 2)), bty = "n")Unoptimized predicted abundances of proteins compared to experimental abundances, both scaled to unit total activity of residues; dashed line is 1:1 line
The canprot package provides a different interface for calculating ZC and other chemical analyses of proteins from their amino acid composition:
Let’s analyze the relative abundances of proteins from the ER-to-Golgi location in S. cerevisiae (yeast) and compare theoretical predictions with experimental measurements from the YeastGFP study (Ghaemmaghami et al., 2003):
Optimizing redox potential to fit experimental protein abundances
The diagrams show:
normalize = FALSE are highly divergent.The correspondence between theoretical predictions and experimental measurements depends on normalization of protein formulas and optimizing physicochemical parameters. The metastable equilibrium model provides a theoretical framework for predicting how chemical conditions influence relative protein abundances.
Evolution doesn’t happen in a vacuum – organisms and their molecular machinery must cope with changing environmental conditions over geological time. Just as geochemists use relative stability diagrams to predict which minerals are stable under different physicochemical conditions, we can apply similar thermodynamic principles to understand protein evolution. The central hypothesis is that environmental variables, such as pH and redox conditions (Eh), have shaped the amino acid compositions of proteins throughout Earth’s history. This approach becomes especially powerful when comparing not individual proteins, but entire evolutionary lineages.
CRISPR-Cas systems are molecular scissors that bacteria and archaea use as immune systems against viruses, and which biotechnologists have adapted for precise gene editing. These systems evolved into six major types (I–VI), each representing an evolutionary branch with multiple representatives across different genomes (Makarova et al., 2020).
Rather than comparing individual proteins, we can ask: which types of
CRISPR-Cas systems would be most thermodynamically stable under
different environmental conditions? The following diagram introduces the
players by showing the carbon oxidation state (ZC) and
size of the effector modules; the effector module combines with CRISPR
RNA (crRNA) to form the effector complex that does the actual cutting
work on a target DNA sequence. Class 1 systems tend to have larger
effector modules made up of multiple proteins, which were combined with
the sum_aa() function from canprot before
calculating ZC.
Carbon oxidation state and size of CRISPR-Cas effector modules
Each type (I–VI) represents an evolutionary branch containing multiple genome representatives – some branches have just a few members, others have more. This creates a methodological challenge: how do we fairly assess the relative stability of groups with unequal membership?
The solution lies in CHNOSZ’s rank.affinity() function,
which calculates formation energies for all individual proteins, ranks
them, then finds the average rank for each evolutionary group. A
rescaling step ensures that groups with different numbers of members can
be compared fairly. To see why rescaling is necessary, consider that the
average rank of a group with one member is bounded by 1 and 42 (the
total number of genomes in this calculation), but the average rank of a
group with three members is bounded by 2 (the average of 1, 2, and 3)
and 41 (the average of 40, 41, and 42).
Rather than representing maximum affinity as in previous diagrams, the resulting stability fields represent maximum average rank of formation affinity after rescaling. This provides a thermodynamic framework for predicting which CRISPR-Cas types would predominate under different environmental conditions.
NOTE: This code normalizes proteins to single residue equivalents
before calling affinity() by using the
as.residue = TRUE argument in add.protein().
If we didn’t do that, then larger effector complexes would have higher
affinity rankings just because of their size.
Groupwise relative stabilities of effector modules in different types of CRISPR-Cas systems as a function of Eh and pH at 25 °C; dashed lines are water stability limits
The stability diagrams reveal a compelling pattern: Type III systems dominate at reducing conditions (low Eh). This finding gains evolutionary significance when we consider that Type III was likely the first CRISPR-Cas system to evolve in Class 1 (Makarova et al., 2022). The thermodynamic preference for reducing conditions aligns with the hypothesis that these ancient immune systems arose when Earth’s atmosphere and oceans were more reducing than today.
Another interesting result is that Type I systems appear to be less stable compared to others in Class 1 at low pH. This could be due to lower frequencies of acidic amino acids in Type I effector modules. Furthermore, Type II systems (in Class 2) are not visualized as stable relative to other Class 2 systems in this chemical space. Changes to other physicochemical variables – represented by some combination of H2O and the elements C, N, and O, as well as temperature and pressure – might be needed to stabilize this group.
The key innovation here – using rank.affinity() to
compare groups rather than individuals – opens up possibilities for
analyzing any system where evolutionary lineages contain multiple
representatives, from enzyme families to entire metabolic pathways. This
groupwise approach to relative stability analysis enriches the
geobiologist’s toolkit with methods for mapping the environmental niches
where different protein families achieve maximum thermodynamic
stability.
Explore demos with demo(package = "CHNOSZ"). You can
also use demos() to run all the demos without pausing or
just one (e.g. demos("mosaic")).
mosaic: Speciating more than one set of basis
speciessum_S, uranyl: Using summed activities of
speciated basis speciescomproportionation: Non-standard Gibbs energy of
reaction with speciated basis speciesarsenic, copper: More examples of Eh–pH
diagramssphalerite, contour, minsol:
Solubility calculations with speciated basis speciesPourbaix: Isosolubility lines for various metals (try
Fe, Cu, Mn)contour: Solubility contours for goldminsol Solubility contours for multiple mineralssolubility: CO2 and calcitesaturation: Saturation lines (where affinity = 0) and
labels for activity ratiosionize: Protein ionization propertiesTCA: Citric acid cycle energeticscomproportionation: Using a color scale (image
map)buffer: Place labels next to linesMgATP: Calculate number of protons bound per ATP
moleculebuffer: Plotting buffers as a function of
temperatureDEW: Applying calculated values of
log fO2 in affinity()gold: Settting pH and fO2 buffers
in basis() for solubility of goldprotbuff: Using proteins as buffer speciesDEW: Deep Earth Water model (extension of HKF to high
pressures)AD: Akinfiev-Diamond model for aqueous
nonelectrolytesShh: Relative stabilities of transcription factors
along redox and water activity gradientcarboxylase: Predicted rank abundance with varying
temperature and redoxrank.affinity: Average rank of affinity for groupwise
stability comparisonsAdditional vignettes cover:
The FAQ is a non-comprehensive collection of questions and answers about CHNOSZ.
The OBIGT vignette is generated from reference information in the database and lists all literature citations for species arranged by default and optional data files.
The custom_data vignette describes
add.OBIGT() for adding data from files,
mod.OBIGT() for updating or adding parameters of particular
species, and logK.to.OBIGT() for generating parameters from
logK values.
The eos-regress vignette shows how to fit experimental data (volume and heat capacity) using constructed equation-of-state models.
The multi-metal vignette has some techniques for overcoming the limitation of balancing reactions on a single metal.
R provides convenient access to documentation in a local browser:
install.packages("CHNOSZ")help.start() to open a browser window for the R
help systemYou can also:
help(package = "CHNOSZ")
?info,
?subcrt, etc.maintainer("CHNOSZ") to get contact info