Category Archives: Data Analysis

Calculating growth rate from microbial growth curves using MATLAB

Growth curve experiments are used to study the physiology of bacteria, yeast, or other micro-organisms. You inoculate cells in a nutrient medium, let them grow, and record the optical density of the culture over time with a spectrophotometer. Automated plate readers can do thousands of growth curves in a single experiment, giving a detailed view of how environmental conditions affect cells.

I've spent many hours analyzing growth curves during my PhD, and almost as many hours teaching others to do the same, so I am going to describe a basic growth curve analysis here and try to highlight some quantitative principles and programming techniques along the way. Hopefully this can save you some time if you are new to growth curves and/or programming.

Demo code and some links for further reading are provided at the bottom of this post.

Simple exponential growth

To guide our analysis, let’s write down some equations. Say that a culture has an initial cell concentration of N_0. Let t_d be the division time, so that after t_d has elapsed, the cell concentration is 2N_0. After another t_d, the culture has 4N_0 cells per volume, and so on. At this rate, the cell concentration at time t is N(t)=N_0\cdot 2^{t/t_d}. Defining the “growth rate” \lambda=1/t_d, which has units of “doublings per time”, we can write

    \[ N(t)=N_0\cdot 2^{\lambda t}. \]

Real data is measured as optical density OD, which is linear in cell concentration: OD_{raw}(t)=aN(t)+b. The constant b represents the “background” signal on the spectrophotometer when only growth medium and cuvette (or plastic plate) are present. Usually, we measure b empirically and calculate the “background-subtracted” optical density OD=OD_{raw}-b. Because this OD is proportional to cell concentration, it will grow exponentially at the same rate:

    \[ OD(t)=OD_0\cdot 2^{\lambda t}, \]

where OD_0=OD_{raw}(0)-b=aN(0) is the (background-subtracted) initial optical density. We can use this to calculate a growth rate \lambda given OD measurements at 2 timepoints. For example, if OD(t_1)=OD_1 and OD(t_2)=OD_2, then

    \[ \begin{matrix} OD_1=OD_0\cdot 2^{\lambda t_1}\\ OD_2=OD_0\cdot 2^{\lambda t_2} \end{matrix}. \]

Take the log2 of both equations and solving for growth rate, we get:

    \[ \lambda=\frac{\log_2 OD_2-\log_2 OD_1}{t_2-t_1}=\frac{\Delta \log_2 OD}{\Delta t}. \]

This formula defines \lambda over a specific time window, which means you can use it to get the “local exponential growth rate” even when growth rate is changing over time in a complicated way. Basically, the RHS above is a discrete version of the derivative d/dt \log_2 OD and so the growth rate at any moment is the slope of the tangent line to the log-transformed growth curve. This is a helpful intuition to have in the analysis below.

Analyzing real data

Now let’s look at real data. I grew some yeast cells in a nutrient medium with glucose as the carbon source and measured the optical density at 600nm every 15 minutes for 65 hours (using a robot). I load the data into MATLAB arrays odraw and t, representing optical density and time, respectively, and then plot odraw versus t. This is the “raw” growth curve, before background subtraction or log-transform.

%% plot raw growth curve

% load data

% plot growth curve
xlabel('Time (hours)');
ylabel('OD_{raw} (a.u.)','interpreter','tex');

Example growth curve

The plot has an upward-curving exponential phase at the beginning and a “kink” in the curve around OD=0.3. This is a diauxic shift--more on this later. For now, let’s find the growth rate in the initial exponential phase. I measured blank samples to get the background as b=0.028, and subtract this from odraw. I also take the log2, and store the results in logod. Some measurements may be less than  due to noise. To avoid taking the log2 of negative numbers, I enforce a minimum value of \log_2 OD=-10, which is about the limit of instrument resolution (3 decimal digits = 1/1000 \approx 2^{-10}). I plot logod versus t, bounding the x-axis at 35 hours.

%% plot log-transformed, background-subtracted growth curve
logod = log2(max(odraw – 0.028,2^-10));

xlabel('Time (hours)');
ylabel('log_2 OD','interpreter','tex');
xlim([0 35]);

Log-transformed growth curve

After the log-transform, most of the middle portion of the growth curve is a straight line—this is the exponential phase, and its slope is the growth rate. I choose two points (t,log_2⁡OD ) to bracket this range: (5.18,-7.97) and (12.05,-3.04), and calculate a growth rate of \lambda=0.72 doublings / hour, or a doubling time of 83 minutes.
The 2-point method is useful for quickly estimating growth rates, but it doesn’t take advantage of the high time-resolution of the data. Nor is it automated. A better method is to fit a line to the data between 5 and 12 hours. Here this gives a growth rate (i.e. slope) of 0.725, almost identical to the 2-point estimate. However, the line fit is likely to be more robust generally, especially when data is noisy.

% ...plot logod vs t...
% fit line to a time window
idx = t>5 & t<12;
brob = robustfit(t(idx),logod(idx));	% brob(2) is the slope

% overlay fitted line
hold all
x = t(idx);
y = brob(1)+brob(2).*t(idx);
plot(x([1 end]), y([1 end]),'ro-');
ylim([-10 0]);

Line fit to time window

A problem with fitting over a defined time window is that cultures can vary in the timing of exponential phase. This is because initial inoculation density varies due to pipetting error, and less dense initial cultures will reach a given OD at a later time. For example, applying the above code to a different strain causes the line fit to “miss” the range of exponential growth, even though this strain doesn’t necessary grow slower. A simple solution is not to choose a time window, but a window of log_2⁡OD instead. For example, fitting from \log_2 OD=-8 to -4 is robust to translations of the curve on the time axis. (I’ll skip the code here for conciseness, but you can download the full working script + data below.)

logod v t od window compare

Scaling up

To analyze many growth curves, I load OD data from a 96-well plate into a MATLAB array od_all with dimensions 8 rows x 12 columns x 262 timepoints. Another array t holds the sampling times, which are common to all wells on this plate. In this experiment, each well contains a different natural isolate yeast strain, which I expect to have different growth behaviors. I plot logod versus t for a 2x3 subset of wells and fit a line to each curve over the same time window as above. The growth rate calculated from this fit is shown in the bottom right for each strain.

%% plot multiple growth curves
strainNames = readtable('strain_names.csv','readvariablenames',false);
[hf ha] = gridplot(2,3,200,200);    % custom function for making subplots
for r = 3:4
    for c = 2:4
        odraw = squeeze(od_all(r,c,:));
        logod = log2(max(odraw - 0.028,2^-10));
        xlim([0 35]);
        ylim([-10 0]);
        xlabel('Time (hours)');
        ylabel('log_2 OD','interpreter','tex');
        hold all
        idx = logod>-8 & logod<-4;
        brob = robustfit(t(idx),logod(idx));    % brob(2) is the slope
        x = t(idx);
        y = brob(1)+brob(2).*t(idx);
        plot(x([1 end]), y([1 end]),'ro-','linewidth',1.5);
        adjustaxeslabels([2 3],[r-2 c-1]);

grid logod v t odwindow

Code & Further Reading

This zip file contains the code and data used to generate the plots above. The main script is called gc_analysis_code.m and contains each section of code in its own cell. A few utility functions (from my utility MATLAB package) are also included in order for the main code to run.

If you want to learn more about growth curve analysis, this review and this paper can get you started on modern methods. For a more conceptual (and historical) perspective, see this 1949 review by Jacques Monod. For examples of growth rate analysis (and more complex growth parameters) being used to answer biological questions, you can probably just close your eyes and point randomly, but my favorite recent examples are this paper on antibiotic drug interactions or this paper on natural phenotypic variation in yeast. At some point in the future, I will also describe how I analyzed diauxic growth curves for my paper on fitness tradeoffs in mixed-nutrient environments.

The difference between selection coefficient and relative growth rate

If you do experiments on microorganisms, you are probably familiar with fitness assays, where you study a mutant strain by comparing its growth rate to that of the wildtype in various environments. If, like me, you have learned the method by reading papers, you may have missed an important fact: there are two different ways of presenting fitness data. Microbial fitness can be reported as either a selection coefficient or a relative growth rate difference, and although these are mathematically related, they are not equivalent. Moreover, they have different conceptual meanings. This point may be obvious to some, but is subtle enough that I thought the two quantities were equal until I had read (and failed to understand) multiple explanations to the contrary.

Here I will try to help other confused experimentalists by explaining this difference in practical terms, as it arises in the analysis of a competitive growth assay. None of this material is original, but I hope my presentation clarifies ideas that aren't as transparent elsewhere.1

If you’re the impatient type, everything below can be summarized in 3 points:

  • A 1% difference between two strains in the number of offspring per generation (selection coefficient) is not the same as a 1% difference in exponential growth rate (relative growth rate difference).
  • A 1% difference in number of offspring per generation is approximately equal to a 0.69% difference in exponential growth rate.
  • You should report fitness as a selection coefficient if you can. If you must use a different metric, state this prominently so people like me don’t get confused.

Continue reading

  1. I borrow especially heavily from Greg Lang's competition assay protocol, the Crow and Kimura population genetics textbook, and this article by Luis-Miguel Chevin that makes the same point I make here, but at a more conceptual level.