Background: Mental health apps (MHAs) provide opportunities for accessible, immediate, and innovative approaches to better understand and support the treatment of mental health disorders, especially those with a high burden, such as bipolar disorder (BD). Many MHAs have been developed, but few have had their effectiveness evaluated. Objective: This systematic scoping review explores current process and outcome measures of MHAs for BD with the aim to provide a comprehensive overview of current research. This will identify the best practice for evaluating MHAs for BD and inform future studies. Methods: A systematic literature search of the health science databases PsycINFO, MEDLINE, Embase, EBSCO, Scopus, and Web of Science was undertaken up to January 2021 (with no start date) to narratively assess how studies had evaluated MHAs for BD. Results: Of 4051 original search results, 12 articles were included. These 12 studies included 435 participants, and of these, 343 had BD type I or II. Moreover, 11 of the 12 studies provided the ages (mean 37 years) of the participants. One study did not report age data. The male to female ratio of the 343 participants was 137:206. The most widely employed validated outcome measure was the Young Mania Rating Scale, being used 8 times. The Hamilton Depression Rating Scale-17/Hamilton Depression Rating Scale was used thrice; the Altman Self-Rating Mania Scale, Quick Inventory of Depressive Symptomatology, and Functional Assessment Staging Test were used twice; and the Coping Inventory for Stressful Situations, EuroQoL 5-Dimension Health Questionnaire, Generalized Anxiety Disorder Scale-7, Inventory of Depressive Symptomatology, Mindfulness Attention Awareness Scale, Major Depression Index, Morisky-Green 8-item, Perceived Stress Scale, and World Health Organization Quality of Life-BREF were used once. Self-report measures were captured in 9 different studies, 6 of which used MONARCA. Mood and energy levels were the most commonly used self-report measures, being used 4 times each. Furthermore, 11 of the 12 studies discussed the various confounding factors and barriers to the use of MHAs for BD. Conclusions: Reported low adherence rates, usability challenges, and privacy concerns act as barriers to the use of MHAs for BD. Moreover, as MHA evaluation is itself developing, guidance for clinicians in how to aid patient choices in mobile health needs to develop. These obstacles could be ameliorated by incorporating co-production and co-design using participatory patient approaches during the development and evaluation stages of MHAs for BD. Further, including qualitative aspects in trials that examine patient experience of both mental ill health and the MHA itself could result in a more patient-friendly fit-for-purpose MHA for BD.