Methodology

Data source used:

A literature search using the Pubmed database was performed.

Cancer: For cancer in relation to vegetables & fruits, the following combination of keywords was used:

(vegetable OR vegetables OR fruit OR fruits OR alfalfa OR apple OR apples OR apricot OR apricots OR artichoke OR artichokes OR asparagus OR aubergine OR avocado OR avocados OR bamboo OR banana OR bananas OR bean OR beans OR beet OR beetroot OR beets OR berries OR berry OR broccoli OR cabbage OR cabbages OR cantaloupe OR cantaloupes OR cauliflower OR carrot OR carrots OR celery OR cherries OR cherry OR chicory OR chili OR chilli OR citrus OR coconut OR coconuts OR coleslaw OR corn OR cruciferae OR cucumber OR cucumbers OR currant OR currants OR dates OR eggplant OR eggplants OR endive OR figs OR garlic OR gherkins OR grape OR grapes OR grapefruit OR grapefruits OR greens OR kale OR kiwi OR kohlrabi OR leek OR leeks OR legume OR legumes OR lemon OR lemons OR lentil OR lentils OR lettuce OR lime OR limes OR maize OR mandarin OR mandarins OR mango OR mangos OR melon OR melons OR mushroom OR mushrooms OR nectarine OR nectarines OR okra OR onion OR onions OR oranges OR papaya OR parsley OR parsnips OR pea OR peas OR peach OR peaches OR pear OR pears OR pepper OR peppers OR pickle OR pickles OR pineapple OR pineapples OR plum OR plums OR pomegranate OR potato OR potatoes OR prune OR prunes OR quince OR radish OR radishes OR raisin OR raisins OR raspberries OR rhubarb OR salad OR salads OR sauerkraut OR scallion OR scallions OR shallot OR shallots OR seaweed OR soy OR soya OR soyfoods OR spinach OR sprout OR sprouts OR squash OR strawberries OR strawberry OR tangerine OR tangerines OR tempeh OR tofu OR tomato OR tomatoe OR tomatoes turnip OR watercress OR watermelon OR watermelons OR yams OR zucchini) AND cancer

A search using the words: "AND (neoplasm OR neoplasms)" instead of the word "cancer" resulted in 0 additional relevant articles.

End points other than cancer: For all other end points, such as CVD, in relation to vegetables & fruits, the following combination of keywords was used:

(vegetable OR vegetables OR fruit OR fruits OR alfalfa OR apple OR apples OR apricot OR apricots OR artichoke OR artichokes OR asparagus OR aubergine OR avocado OR avocados OR bamboo OR banana OR bananas OR bean OR beans OR beet OR beetroot OR beets OR berries OR berry OR broccoli OR cabbage OR cabbages OR cantaloupe OR cantaloupes OR cauliflower OR carrot OR carrots OR celery OR cherries OR cherry OR chicory OR chili OR chilli OR coconut OR coconuts OR coleslaw OR corn OR cucumber OR cucumbers OR currant OR currants OR eggplant OR eggplants OR endive OR figs OR garlic OR gherkins OR grape OR grapes OR grapefruit OR grapefruits OR greens OR kale OR kiwi OR kohlrabi OR leek OR leeks OR legume OR legumes OR lemon OR lemons OR lentil OR lentils OR lettuce OR lime OR limes OR maize OR mandarin OR mandarins OR mango OR mangos OR melon OR melons OR mushroom OR mushrooms OR nectarine OR nectarines OR okra OR onion OR onions OR oranges OR papaya OR parsley OR parsnips OR pea OR peas OR peach OR peaches OR pear OR pears OR pepper OR peppers OR pickle OR pickles OR pineapple OR pineapples OR plum OR plums OR pomegranate OR potato OR potatoes OR prune OR prunes OR quince OR radish OR radishes OR raisin OR raisins OR raspberries OR rhubarb OR salad OR salads OR sauerkraut OR scallion OR scallions OR shallot OR shallots OR seaweed OR soy OR soya OR soyfoods OR spinach OR sprout OR sprouts OR squash OR strawberries OR strawberry OR tangerine OR tangerines OR tempeh OR tofu OR tomato OR tomatoe OR tomatoes turnip OR watercress OR watermelon OR watermelons OR yams OR zucchini) AND (prospective OR cohort OR follow-up OR longitudinal)


Inclusion/exclusion criteria for articles:

Inclusion criteria:
1) Consumption of a dietary variable.
2) Endpoint: cancer risk, disease progression, or cancer mortality/survival risk.
3) Prospective studies (cohort or nested case-control design).
4) The full text article was published in English. Articles excluded because of language restrictions are defined in the related abstracts.

Exclusion criterium:
Data was excluded if risk among cases stratified by genetic polymorphisms was examined instead of risk among a total population.
Example: Red meat intake, CYP2E1 genetic polymorphisms, and colorectal cancer risk. only provides information about the relationship between 2 functional polymorphisms and their modifying effects on the association between diet and cancer. Whereas no information is shown about risk of diet for all cases vs controls.

Perspective/possible limitation:
-Only whole foods were included as dietary variables. Data about for example, food fiber and food flavonoids/isoflavones/carotenoids was not included to be judged as evidence. These nutrients may reflect consumption of food items, but it is difficult to translate possible effects from these nutrients back to recommendations for whole foods. Especially when a food item may not even have been included in the model to examine the effect of a given nutrient in the first place.
For reviews published on January 19, 2010 and later, data was included from articles which clearly defined the food sources of the nutrients if this nutrient was correlated directly to a specific food item.
-Since reference lists of included articles were searched for additional articles, the chance of missing relevant data is keps small. Still, Supplements from scientific journals - such as The American Journal of Epidemiology - contain small articles which are not indexed in Pubmed, and which may contain some relevant data which might change some levels of evidence for a small amount of dietary variables in the reviews. The author of this site has started hand searching these Supplements, and any information retrieved from these sources will be added to the related reviews at the next update.


The way data from the articles was used:

Tables: The systematic reviews contain simple tables defining the author, cohort name, amount of cases, and Relative Risk.
Possible reproducibility is important in a systematic review. For this reason it is important to extract data in a transparent manner. This is why - for all dietary variables - also larger "extended tables" were created to include raw data from the articles using predefined methods for certain variables, described here:


1) FOOD GROUPS.
Data was extracted of all total food groups (e.g. vegetables), subgroups/botanical families (e.g. cruciferous vegetables), and specific food items (e.g. broccoli) related to the food group(s) defined in the review of interest. A definition of the dietary variables was added if directly available from the related publication. If more than 1 article published data from a cohort about the same dietary variable, all data was added to the extended tables.

Indexing vegetables or fruits into botanical families or vegetable/fruit subgroups:
Some articles specify data about certain botanical families. Depending on the FFQ from the cohort, any number of specific vegetables and/or fruits are indexed into these families.
-When a) no definition of a botanical family is given, or b) the definition of a dietary variable included > 1 specific vegetable and/or fruit, data from this variable is indexed under the specific botanical family it belongs to.
-When the definition of the botanical family includes only 1 vegetable or fruit items, data from this variable is indexed under this specific vegetable or fruit item.
-Data about gramineae is added to the variable "corn", unless it came from an Asian cohort.
-Data about musaceae is added to "bananas", and data about vitaceae is added to "grapes", since these are the only fruit items indexed under these families.

  • Allium vegetables: garlic, leek, onions.
  • Chenopodiaceae: beetroot or beet, chard greens/Swiss chard, spinach.
  • Citrus fruit (rutaceae): grapefruit, mandarins, oranges, tangerines.
  • Compositae: endive, lettuce.
  • Convolvulaceae: sweet potatoes, yams.
  • Cruciferous vegetables (cruciferae/brassica vegetables): broccoli, Brussels sprouts, cabbage, coleslaw, cauliflower, kale, mustard greens, rutabaga, sauerkraut)
  • Cucurbitaceae: cantaloupe, cucumber, gherkins, squash, watermelon, zucchini/courgette.
  • Gramineae: bamboo, corn/maize.
  • Green leafy vegetables: beet greens, borage, cabbages, chard, chicory, chingensai, chives, collards, dandelion greens, endive, escarole, garden rocket, garland chrysanthemums, Jew's mallow, kale, lettuce, mugwort, mustard greens, parsley, spinach, seaweed, thistle, turnip greens, watercress.
  • Green vegetables: containing any amount of items from the group of green leafy vegetables & any amount of other green vegetable items, such as broccoli, Brussels sprouts, green peppers, or legumes.
  • Green-yellow vegetables: containing any amount of items from both the green vegetables and yellow vegetables group.
  • Legumes (leguminosae/pulses): alfalfa, beans, lentils, peas, soy.
  • Musaceae: bananas.
  • Root vegetables: beetroot, carrots, celeriac, ginger, parsnip, radish, rutabaga, salsify, swedes, turnip.
  • Rosaceae: apples/pears, apricot, peaches, plums, prunes, strawberries.
  • Solanaceae: aubergine/eggplant, potatoes, peppers, tomatoes.
  • Umbelliferae: carrots, celery.
  • Vitaceae: grapes, raisins.
  • Yellow fruits: apples, apricots, bananas, melons, oranges, peaches.
  • Yellow vegetables: carrots, pumpkin, squash, sweet potatoes, red peppers, tomatoes, yams.

2) CANCER RISK.
Data of dietary variables was extracted regarding the relationship with cancer (incuding disease stage if data is available), disease progression, cancer mortality, or cancer survival.
Data about modifying effects on risk by potential confounders (e.g. age, sex, obesity, physical activity, menopausal status, hormone replacement therapy, ethnicity, and smoking) was added as well.

3) RELATIVE RISK AS A CATEGORIZED-, OR CONTINUOUS VARIABLE.
When RRs were available for associations evaluated both as a categorized variable (increasing units of consumption), and as a continuous variable (for an increment of X g or servings/day), the categorized variable was chosen to be included in the review. The categorized variable allows the possibility to define information about a) possible tresshold effects, and b) J, U, or other-shaped effects.
In addition, Relative Risks for an increase per 2 units of consumption may not reflect predicted Relative Risks based on increases per 1 unit of consumption, which complicates translating data to recommendations for individuals. For example, Chan JM (2006) found a significant association with prostate cancer progression risk of 4 dietary variables based on an increase of 1 serving/day. When this risk was based on an increase of 2 servings/day, all 4 associations almost disappeared.

Reference: Chan JM, Holick CN, Leitzmann MF, Rimm EB, Willet WC, Stampfer MJ. Diet after diagnosis and the risk of prostate cancer progression, recurrence, and death (United States). Cancer Causes Control. 2006 Mar;17(2):199-208. Abstract

4) AMOUNT OF CANCER CASES.
Often, data about the amount of cancer cases related to dietary variables was less than the total amount of cases from the cohort as specified in the abstract. This is due to missing values from incomplete Food Frequency Questionnaires. Data about the amount of cases was collected in the following descending order:
a) Data about the amount of cases was extracted from tables added by the authors to provide information about the dietary variables of interest.
b) If no direct data about the amount of cases for a dietary variable was provided, It was chosen to use the amount of cases - as specified by the author - from the food group the dietary variable belonged to. And to add a "?" symbol, indicating that the true amount of cases may be less.
c) If no direct data about the amount of cases for a dietary variable was provided, and no data was was provided by the author about amounts of cases from total food groups, It was chosen to use the total amount of cases from the cohort. And to add a "?" symbol, indicating that the true amount of cases may be less.

5) SIGNIFICANT OR NONSIGNIFICANT ASSOCIATIONS (EFFECTS OR TRENDS).
Though recent articles seem to use the same definition for a "significant association" (based on P-value), no consistent definition can be found for the term "nonsignificant association". In various articles, the term "nonsignificant association" was used for fairly strong associations without a dose response, or for weak associations with a dose response, but with a P-value sometimes far exceeding 0.20 or even 0.40.

For the reviews on this site, associations were defined as effects or trends. The following definitions were chosen:
A significant effect/risk.
This term was used under the following circumstances:
1) The 95% CI does not embrace a RR of one.
2) The article provides no P-value, but uses the term "significant" in relation with the association.
A significant trend.
P (for trend) = ≤ 0.05.
A nonsignificant effect.
This term was used under the following circumstances:
1) One value from the 95% CI embraces the one, and the other value is ≥ 10% different from the one (e.g. 95% CI = 0.90-1.00, or 1.00-1.21 would indicate a nonsignificant protective effect, and a nonsignificantly increased risk, respectively).
2) The article provides no P-value, but used the term "nonsignificant" in relation with the association.
A nonsignificant trend.
This term was used under the following circumstances:
1) P (for trend) = > 0.5, and ≤ 0.1.
2) For this review, a second circumstance was created when no P-value was given. This was done to a) create transparency in interpreting data, and to b) simplify judging the weight of evidence of the cumulative data from various cohorts.
Three criteria were chosen to carry the term "nonsignificant association":
-A dose response effect. The RR increases/decreases with every increasing unit of consumption consistently (for associations defined in ≥ 2 numbers), or is equal to risk in the unit next to it (only for associatons defined in ≥ 3 numbers).
Data must be available from at least 3 units of consumption.
-Risk for the highest vs lowest unit of consumption increases/decreases by a total of > 20%. The value "> 20%" was chosen as a balance to a) diminish the possibility of finding lots of weak "associations" from results presented in tertiles - which may bias judging of evidence- , and b) to enable the possibility of finding consistent associations from smaller cohorts, which do not reach the level of significance because of limited amounts of cases.
-A minimum of 100 cases. With few cases, the odds will increase that they will be randomized to the different units of consumption without showing a possible causal effect within the cohort.

6) ADJUSTMENT FOR CONFOUNDERS.
In general, RRs were used from the model which adjusted for the largest number of confounders. However, if a model was found which additionally adjusted for nutrients-only, it was chosen not to use RRs from this model. This might imply that an effect from a nutrient can be equally found among all dietary sources of this nutrient. It was decided that this is an assumption which does not reflect current knowledge about nutrients.


Defining the level of consumption at which an effect was found.

Dealing with different units of measure.
When evidence suggested a possible association, results were put into graphics. Results often were defined in grams or servings, but some authors published results in units (= grams/servings)/(X kcal./KJ)/day. The following conversions were used to switch between different units of measure:

  • -One serving of total vegetables = 77 g. One serving of total fruit = 80 g (1).
    -Serving sizes from specific fruit/vegetables items were derived from the USDA (2).
    -No attempt was made to convert different units of measures for vegetables/fruits subgroups, but graphics of these variables were created when results were defined in identical measures for units of consumption.
  • One serving of total vegetables or total fruit = 0.5 cup.
  • When units of consumption were defined in consumption/1000 kcal./day, it was assumed that the average intake of energy can be found at the level of 2,500 kcal among men and 2,000 kcal among women.

Results put into graphics.
An attempt was made to find out if evidence could be found for a possible association at certain levels of consumption. Results were put into graphics when the possibility of weak-strong evidence for an association (see "Judging the evidende") was suggested from the results. Therefore, results were put into graphics when a significant protective effect/increased risk at any level of consumption and/or a significant trend was found in ≥ 2 cohorts of moderate-large size (≥ 20,000 subjects) and including an amount of cases covering ≥ 50% of the total amount of cases in all cohorts combined.
Graphics were made to see if significant results from different cohorts overlapped at any given level of consumption.
Two additional criteria were created to enable quantification of results. Results were included in graphics if:

  • Results were published as a categorized variable, and in at least 3 units of consumption. Excluding study results published as a continuous variable.
  • Results were published as consumption of a specific amount (grams, servings, or cups) over a given time period. Excluding study results published in frequency/time period, or without a definition for different units of consumption.


|References:
1) He FJ, Nowson CA, Lucas M, MacGregor GA. Increased consumption of fruit and vegetables is related to a reduced risk of coronary heart disease: meta-analysis of cohort studies. J Hum Hypertens. 2007 Sep;21(9):717-28. Abstract
2) USDA Nutrient Data Labatory. Link|


Judging the evidence:

Motivation for criteria used to create the evidence model.
In 2007, The World Cancer Research Fund (WCRF) published the largest systematic literature review about the relation between diet (+ other factors) and cancer ever. The goal of this report was to review all the relevant research, using the most meticulous methods, in order to generate a comprehensive series of recommendations on food, nutrition, and physical activity, designed to reduce the risk of cancer and suitable for all societies.
The WCRF found that the best evidence does not come from any one type of scientific investigation. It comes from a combination of different types of epidemiological and other studies, supported by evidence of plausible biological mechanisms. Still, for all 3 main levels of evidence, results fromt ≥ 2 independent cohorts studies were required (or ≥ 5 case-control studies for the lower 2 main levels of evidence).

Judging of evidence was inspired by the WCRF model. Of course, the methodology for the systematic reviews on this site can not compete with the methodology from the WCRF, since reviews were created solely by results from cohort studies. Still, levels of evidence were created with similar goals. For the highest level of evidence this was defined as: To be robust enough to be highly unlikely to be modified in the foreseeable future as new evidence accumulates.
To reach this goal within limits of a single study design, some criteria - beyond little heterogeneity - were created to judge the evidence based on: a) the amount of cohorts, b) the size of the cohorts, and c) a criterium incorporating both variables "cohort size" and "follow-up time" in a single variable, namely the amount of cases.

Judging the evidence.
Levels of evidence were created, based on consistency of effects/trends, and the strenght to withstand possible opposite findings from current ongoing cohorts within the next couple of years.
A lot of cohorts are currently providing information about diet & various health outcomes. Some cohorts are of very large size: 200,000 to 1,000,000 subjects (1-4). Each of these single cohorts may - on their own - provide information based on an amount of disease cases so large, that this may compensate association found by the current combined findings of all cohorts for a large amount of disease outcomes!

Evidence is divided into 3 main levels (possible, probably, convincing) which are based on a) the amount and b) the size, of cohorts in which a significant association was found. In addition, nonsignificant associations are judged as suggestive evidence.

  • In general, any evidence requires:
    -A significant association in ≥ 2 cohorts, or
    -A nonsignificant association in ≥ 4 cohorts (suggestive evidence only).
  • Evidence is based on the consistency of an association. Consistency requires:
    -A (non)significant association in an amount of cases covering ≥ 50% of the total amount of cases in all cohorts required (≥ 25% for suggestive evidence).
    -Little heterogeneity between study results. Defined as: a) no significant association in the opposite direction, and b) no nonsignificant association in the opposite direction in an amount of cases covering ≥ 10% of the total amount of cases in all cohorts combined.

The 4 levels of evidence are described here. A display of the model used can be found at the bottom of this page. For analysis stratified by sex, requirements for cohort size are divided by 2.

Suggestive evidence.
This level of evidence indicates an interesting finding in more than one study.
In addition to the previous criteria, one of the following findings is required to carry the label of "suggestive evidence":

  • A significant association in ≥ 2 cohorts of any size, or
  • A nonsignificant association in ≥ 4 cohorts of any size.

Possible evidence.
This level of evidence indicates a significant association in more than one study of ≥ moderate size.
In addition to the previous criteria, the following finding is required to carry the label of "possible evidence":

  • A significant association found in at least 2 cohorts of moderate-large size (≥ 20,000 subjects each).

Probable evidence.
This level of evidence indicates consistent findings from multiple human studies.
In addition to the previous criteria, the following finding is required to carry the label of "probable evidence".

  • A significant association found in at least 4 cohort studies, including at least one cohort of very large size (≥ 200,000 subjects).

Convincing evidence.
This level of evidence indicates consistent findings from multiple human studies, some of which are of very large size. The goal of this level of evidence is a) to exclude the necessity to include data from case-control studies for judging evidence, and b) to diminish the possibility that future findings form any single cohort of very large size, can completely eliminate the evidence. The amount and size of the cohorts should "generate" an amount of cases large enough to reach this goal.
In addition to the previous criteria, the following finding is required to carry the label of "strong evidence".

  • A significant association found in at least 6 cohort studies, including at least two cohorts of very large size (≥ 200,000 subjects).

References:
1) The Million Women Study.
2) The NIH-AARP Diet & Health Study.
3) The EPIC Study.
4) The Multiethnic/Minority Cohort Study.