Introduction

Starbucks’ gargantuan $26 billion of revenue in 2019 is resounding evidence that Americans, above everything, love their coffee. They prove it time and time again, most recently during the current pandemic: some, unable to relinquish their daily ‘bucks, waited hours on end in long drive-thru lines to get their fix when coffee shops were closed for in-person operations.

It is no surprise, then, that Starbucks owns more than 8,000 stores across the US, and continues to grow everyday. These sheer numbers are reflective of a company that is about more than just coffee, having a significant influence on US society and culture with (often controversial) initiatives such as removing religious references from holiday-themed cups. In addition, Starbucks is known for treating their employees extremely well by providing health coverage, tuition coverage, and 401(k) plans, which again reinforces their highly-regarded brand.

As such, the astounding popularity of the chain in the country and the prospect of further expansion raises important questions. What can the current distribution of Starbucks stores tell us about societal factors across the country? Which factors should the brand consider when expanding into new locations? How can companies like Starbucks, which pride themselves in a positive social and environmental outlook, incorporate such values into their corporate strategy - especially in the current pandemic?

Overview of Analysis

In order to answer these questions, we sought to understand the relationship between state-level social and economic factors, such as income and unemployment; and the distribution of Starbucks locations in the US. Furthermore, we wanted to understand what company behavior and decision-making could look like with a more social-minded approach, instead of considering sheer profits. Starbucks shops, after all, can be significant contributors to social good on a localized level by providing stable jobs that open up avenues for growth and education.

As such, in this project, we conduct a consulting case-study of Starbucks, analysing what insights are gained from the current distribution of stores in the US and providing socially-minded recommendations for new locations in which the brand can expand. We start this endeavor with a spatial analysis of Starbucks stores across the US, exploring the relationship between store density and income/unemployment level in each state. We then complement this analysis with a text-based sentiment analysis of recent tweets mentioning Starbucks, and explore how these sentiments vary across location. Finally, we use these variables to conduct a clustering analysis of states in the US, in order to determine the different “types” of states under which Starbucks operates. We proceed to use these clusters to identify the ideal locations for the brand to build new stores.

Data

In order to conduct this analysis, we collected the following data:

  • The Starbucks Locations Worldwide dataset available on Kaggle and provided by Starbucks Corporation, including a record for every Starbucks or subsidiary store location that was operation during February 2017.
  • The US Household Income Statistics dataset, available on Kaggle and provided by the Golden Oak Research group, containing information on US cities’ average income.
  • The US Unemployment Rate by County dataset avaiable on Kaggle and provided by the US Department of Labor’s Bureau of Labor Statistics, containing unemployment and population data at the US county level from 1990-2016.
  • The most recent 18,000 tweets containing the string “Starbucks” or “starbucks” in their text, scraped using the RTweet package to interface with the Twitter API. To supplement the Twitter data, we used NRC Lexicon to find the sentiments of each tweet.

To clarify, when conducting analysis on a state-level, we summarized the information collected on a county-level to reflect the county’s respective state. In addition, a specific factor repeatedly used in our analysis is Income. Income, in this case, represents the mean income of a resident for each county, and when used on a state-level, represents the average income of a resident for each state.

Check the links in the bullets above for access to each of the data sources!

Distribution of Starbucks Across the US

The first step of our analysis was to understand how Starbucks stores are distributed across the US:

Looking at the plot above, displaying the distribution and density of Starbuck locations across the US, we can see that Starbucks locations are extremely clustered along the coast of the US, and less prominent within states in the Rocky Mountain and Midwestern regions of the country. This is evidence that there is a correlation between income and Starbuck store location, as we see a concentration of stores in the richer parts of the country, such as several states in the Northeast and California. This is corroborated by the heavy concentration of Starbucks in the major cities within each state, as opposed to being spread out across the state’s counties. Looking at the map, one can easily identify major cities such as New York, Los Angeles, Houston, Miami and Boston as Starbucks hotspots.

With strong evidence of high spatial concentration of starbucks stores and a correlation between income and number of stores, we wanted to understand if that gave rise to any patterns regarding sentiment towards Starbucks. Specifically, we theorized that areas with more Starbucks stores or higher income should have a more positive sentiment towards the brand. In the following section, we test this hypothesis using text-based sentiment analysis of recent tweets that mention Starbucks.

Cosumer Sentiments Towards Starbucks Across US

After using the RTweets package to scrape text and locations for the latests 18,000 tweets mentioning Starbucks, we calculated the proportion of negative/positive words (as defined by NRC Lexicon) in each state.

As mentioned previously, our initial hypothesis was that states with larger densities of starbucks would tweet more positive and less negative opinions. However, when observing the plots below, states with more Starbucks, such as California, Texas, Florida, and New York, all have a lower proportion of positive words in tweets and high proportion of negative words in tweets compared to other states in the US. Additionally, the states with a lower amount of Starbucks, such as Wisconsin, Iowa, Kansas, and Mississippi, had a higher proportion of positive words and lower proportion of negative words in tweets. This leads to the conclusion that it may be more beneficial for Starbucks to build its stores in states it is not currently heavily established in, according to consumer sentiment analysis.

Clustering by State

Motivated by the potential that Starbucks might want to expand into areas it has not heavily invested in, we sought to understand how we could segment states into categories in order to determine how Starbucks might strategize differently in each of them.

State-Level Factors

We thus employed a k-means clustering algorithm to US States, using the following variables in order to categorize them:

  • Average Proportion of Negative Sentiment in Tweets per State
  • Average Income of Each Resident per State
  • Average Unemployment Rate per State
  • Number of Starbucks per State

It is important to note that these factors have significantly different scales. Thus, we incorporated standardized values into our cluster analysis so that the clustering algorithm weights all variables equally.

State-Level Clusters

The total within-cluster sum of squares (tot.withinss) is the metric that can be used to judge the number of clusters, or K, required. So we can repeat the analysis with different cluster values and keep noting the tot.withinss values to assess which number of clusters is best. This information is displayed through the “Elbow Plot”. From the cluster analysis below, it seems that 5 clusters result in the most sufficient marginal information gain.

Visualization of Cluster

Cluster Characteristics

The matrix below shows the distribution of each cluster among the four categories. Cluster 1, which is composed solely of California, is characterized by having the highest number of Starbucks stores by a large margin (a total of 2821!). It’s also interesting to note that California has high average income and a high unemployment rate, which is unique since there is an inverse correlation between the two in other states. Cluster 2 represents the states with lower income than most but also relatively low unemployment, being a special case of the less-wealthy states. Cluster 3, composed solely of Vermont, is characterized by having the highest negative consumer sentiment. Cluster 4 represents the most wealthy states in the country, having high average income and a median unemployment rate. Finally, Cluster 5 represents the most impoverished states in the US, having a high unemployment rate, low average income and low number of Starbucks stores.

Final List of States

Having segmented states by their characteristics, we decided to pick one from each cluster in order to determine what a socially-minded strategy for Starbucks within that group of states might look like. To do so, we decided to pick the centermost state within each cluster, as the state “most representative” of that cluster. For California and Vermont, this was trivial, since they were the only states within their respective clusters. For the other three clusters, we calculated the distance from each state to its respective center and picked the states with the minimum distance. From this analysis, we concluded the final 5 states, each representative of their own cluster, are:

  • California - Cluster 1
  • Kansas - Cluster 2
  • Vermont - Cluster 3
  • Virginia - Cluster 4
  • Michigan - Cluster 5

Clustering by County

Expanding on the state-level analysis, we analyzed the potential counties that Starbucks might want to expand into using the same factors as we did for the states, except for consumer sentiments. This allows our analysis to be consistent as the county-level analysis will follow the same analysis and be assessed by the same factors in hopes Starbucks may strategize their next location differently. There was a three-step approach when collecting data to form clusters within each state:

  1. Cluster analysis
  2. Creating clusters based on appropriate number of centers
  3. Determine the most suitable cluster of counties

After observing the “elbow plot,” the subsequent number of centers chosen leads to a geographical visualization and cluster matrix for that state. Then, after analuzing the three county-level factors, we choose the one cluster of counties that will best support an even better Starbuck’s brand and community.

County-Level Factors

We again employed a k-means clustering algorithm but now to US counties, specifically in the “center” states from our previous findings. For our clusters, the following variables were utilized to categorize them:

  • Mean Income of Each Resident per County
  • Unemployment Rate per County
  • Number of Starbucks per County

Determining Amount of Clusters for Each State

Virginia

Based on the cluster analysis, the total within-cluster sum of squares seems to significantly diminish after 4 clusters. Therefore, in our evaluation of Virginia, the state was divided into 4 clusters. To best contribute to a more socially-minded Starbucks, we believe the counties a part of Cluster 2 are best. The counties in Cluster 2 have the lowest household mean income and highest unemployment rates relative to the other clusters, and also has a low amount of Starbucks stores. Establishing a Starbucks location in one of the counties of Cluster 2 will provide more job opportunties and support for a community that, according to the factors selected, need it.

Vermont

Based on the cluster analysis, the total within-cluster sum of squares seems to significantly diminish after 4 clusters. Therefore, in our evaluation of Vermont, the state was divided into 4 clusters. To best contribute to a more socially-minded Starbucks, we believe the counties a part of Cluster 4 are best. The counties in Cluster 4 have the lowest household mean income and highest unemployment rates relative to the other clusters, and also has a low amount of Starbucks stores. Establishing a Starbucks location in one of the counties of Cluster 4 will provide more job opportunties and support for a community that, according to the factors selected, need it.

Michigan

Based on the cluster analysis, the total within-cluster sum of squares seems to significantly diminish after 4 clusters. Therefore, in our evaluation of Michigan, the state was divided into 4 clusters. To best contribute to a more socially-minded Starbucks, we believe the counties a part of Cluster 1 are best. The counties in Cluster 1 have the lowest household mean income and highest unemployment rates relative to the other clusters, and also has a low amount of Starbucks stores. Establishing a Starbucks location in one of the counties of Cluster 1 will provide more job opportunties and support for a community that, according to the factors selected, need it.

Kansas

Based on the cluster analysis, the total within-cluster sum of squares seems to significantly diminish after 4 clusters. Therefore, in our evaluation of Michigan, the state was divided into 3 clusters. To best contribute to a more socially-minded Starbucks, we believe the counties a part of Cluster 2 are best. The counties in Cluster 2 have the lowest household mean income and highest unemployment rates relative to the other clusters, and also has a low amount of Starbucks stores. Establishing a Starbucks location in one of the counties of Cluster 2 will provide more job opportunties and support for a community that, according to the factors selected, need it.

California

Based on the cluster analysis, the total within-cluster sum of squares seems to significantly diminish after 4 clusters. Therefore, in our evaluation of California, the state was divided into 4 clusters. To best contribute to a more socially-minded Starbucks, we believe the counties a part of Cluster 1 are best. The counties in Cluster 1 have the lowest household mean income and highest unemployment rates relative to the other clusters, and also has a low amount of Starbucks stores. Establishing a Starbucks location in one of the counties of Cluster 1 will provide more job opportunties and support for a community that, according to the factors selected, need it.

Final Recommendations

After selecting the specific counties of the desired clusters from each state, we have computed a data table that shows the household mean income, unemployment rate, and number of Starbucks of each county. We hope this condensed list of counties will provide Starbucks with our rationale and context for our findings.

Conclusion and Limitations

The crux of this project is not only these final recommendations for new locations in which to expand, but what we learned in the process of getting to these. Firstly, it is evident from an analysis of spatial distribution of Starbucks stores in the US that these stores are heavily concentrated in populated, rich, urban centers. Furthermore, it is likely that this extends to other restaurants and even different types of brands. Ultimately, this is a testament to how the process of urbanization in the US is characterized by the formation of cities as centers of not only production, but consumption. One cannot blame Starbucks for “following the money”, but this raises questions about equity across urban spaces in the US.

Another interesting finding was that there is no positive correlation between the amount of Starbucks stores and positive sentiment towards Starbucks. This ran counter to our initial intuition that Starbucks would concentrate on the places that felt most “positive” towards it. It opens up the exciting possibility that the strategy of doubling down on “cash cow” locations like New York might not be the best strategy for the company, which should instead consider the possibility of expanding into locations with positive sentiment in order to bolster its brand image. More broadly, this raises the question of the extent to which companies should trade-off between potential short-term profit and long-term benefits to brand image.

Our most important finding and point that we wish to make is that companies like Starbucks can play a role in improving social conditions across the country, even though these large corporations are often vilified for their greed. A social-minded approach towards expansions as the one we have shown above combined with effective company policies such as extensive health and tuition coverage can do wonders to the most underprivileged counties in the US, providing employment and growth opportunities for youth. Moving forward, we hope to see companies take a more active role in being mindful about the impact they can have in society, be that positive or negative.

With the takeaways in mind, it is important to consider some of the main limitations of our analysis. An extremely relevant limitation is that we had a limited amount of data to work with and only used a handful of variables in our analysis. Possible data-related extensions we considered include using factors such as Starbucks revenue by state or county, population density, and state-level measures of education and health. We would love to see extensions of this project that take these other facets of society into account, or perhaps apply them to other brands.

Another strong limitation of our analysis is that our scraping of tweets is limited to the past 7 days, as gathering more data required the use of a paid API key. This allowed us to gather a snapshot of time, but we believe it is present-biased by momentary whims and are not confident that it is representative of overall sentiment towards Starbucks in each state. A further limitation in this regard is that Twitter users do not need to and often choose not to specify where they live, leading to an overall small amount of usable data. An extension incorporating tweets over a longer period of time would lead to more robust data that is likely more sentiment of overall consumer sentiment towards Starbucks in each state.

A final limitation of our analysis is that, ultimately, we have simply theorized what Starbucks “might” act like if it took on a socially-minded approach to growth, but we haven’t identified any incentives for the company to follow through on this type of behavior. Ultimately, building stores in the counties we recommend would likely lead to lower profits than the company would otherwise achieve. Subsequent studies that incorporate a profitability analysis could hopefully shine light on the ways in which being socially-minded is not only beneficial to society, but also has the potential to improve company profitability.

Citations

Golden Oak Research Group. (2017, August). US Household Income Statistics, Version 8. Retrieved November 10, 2020 from https://www.kaggle.com/goldenoakresearch/us-household-income-stats-geo-locations.

Kearney, M. W. (2019). rtweet: Collecting and analyzing Twitter data, Journal of Open Source Software, 4, 42. 1829. doi:10.21105/joss.01829 (R package version 0.7.0)

Ravaliya, J. (2017, March). US Unemployment Rate by County, 1990-2016, Version 2. Retrieved November 10, 2020 from https://www.kaggle.com/jayrav13/unemployment-by-county-us.

Starbucks. (2017, February). Starbucks Locations Worldwide, Version 1. Retrieved November 10, 2020 from https://www.kaggle.com/starbucks/store-locations.