{"id":2711,"date":"2023-06-24T01:44:00","date_gmt":"2023-06-24T01:44:00","guid":{"rendered":"https:\/\/mondaikaiketsu.net\/?p=2711"},"modified":"2025-05-03T07:31:50","modified_gmt":"2025-05-03T07:31:50","slug":"keypoints-logistics-regression-analysis","status":"publish","type":"post","link":"https:\/\/mondaikaiketsu.net\/en\/keypoints-logistics-regression-analysis\/","title":{"rendered":"Key points for \u201cLogistic Regression Analysis\u201d"},"content":{"rendered":"\n<p>Hello everyone. <a href=\"https:\/\/mondaikaiketsu.net\/en\/utilizing-sleeping-data\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Last time<\/a>, I wrote about \u201cThorough utilization of data that is sleeping in your company\u201d as &#8220;Problem Solving Practice Edition&#8221;. &nbsp;In the post, I wrote that I would like to write about \u201cLogistic Regression Analysis\u201d later.&nbsp; So this time, I would like to write about it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>What is \u201cLogistic Regression Analysis\u201d?<\/strong><\/strong><\/h2>\n\n\n\n<p>In <a href=\"https:\/\/mondaikaiketsu.net\/en\/utilizing-sleeping-data\/\" target=\"_blank\" rel=\"noopener\" title=\"\">the previous post<\/a>, I was trying to do multiple regression analysis using the following data.<\/p>\n\n\n\n<p>Gender<\/p>\n\n\n\n<p>Age<\/p>\n\n\n\n<p>Age group<\/p>\n\n\n\n<p>Marriage status<\/p>\n\n\n\n<p>Living prefecture<\/p>\n\n\n\n<p>Private brands you are aware of (multiple choice)<\/p>\n\n\n\n<p>Image by private brand (multiple choice)<\/p>\n\n\n\n<p>Change in purchase frequency of private brands in the last 1-2 years (increase\/decrease\/no change)<\/p>\n\n\n\n<p>Reasons for the above (free answer)<\/p>\n\n\n\n<p>Food products that have switched to private brands within the past year (multiple choice)<\/p>\n\n\n\n<p>Seasonings that have switched to private brands within the past year (multiple selections)<\/p>\n\n\n\n<p>Beverages that have switched to private brands within the past year (multiple choice) <\/p>\n\n\n\n<p>Drugs that have switched to private brand within the past year (multiple choice), etc. <\/p>\n\n\n\n<p>Looking at the raw data columns above, the first thing that seems to be the outcome variable is \u201cChange in purchase frequency of private brands in the last 1-2 years (increase\/decrease\/no change). Also, it seems that \u201cFoods\/seasonings\/beverages\/drugs that have switched to private brand within the past year\u201d can also be used.<\/p>\n\n\n\n<p>\u201cChanges in purchase frequency of private brands in the last 1-2 years (increased\/decreased\/no change)\u201d are (1\/2\/3), \u201cFoods\/seasonings\/beverages\/drugs that have switched to private brand within the past year\u201d is a multiple-choice answer (0\/1). These types of data are called \u201cdiscrete variables\u201d.<\/p>\n\n\n\n<p><a href=\"https:\/\/mondaikaiketsu.net\/en\/utilizing-sleeping-data\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Last time<\/a>, I wanted to do a <a href=\"https:\/\/mondaikaiketsu.net\/en\/keypoints-regression-analysis\/\" target=\"_blank\" rel=\"noopener\" title=\"\">multiple regression analysis<\/a> on Excel, so I made a column of continuous variables from discrete variables and proceeded with the analysis. Multiple regression analysis on Excel is called &#8220;linear regression analysis&#8221;, and only continuous variables can be taken as result variables, right?<\/p>\n\n\n\n<p>This time, I will take up &#8220;logistic regression analysis&#8221; that can analyze the above discrete variables as outcome variables as they are.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Difference between \u201cMultiple Regression Analysis\u201d and \u201cLogistic Regression Analysis\u201d<\/strong><\/h2>\n\n\n\n<p>In multiple regression analysis (linear regression analysis), the independent variable x changes the value of the outcome variable y. Therefore, it is possible to predict the &#8220;value&#8221; of the outcome variable from the independent variables. On the other hand, the result variable of logistic regression analysis is a discrete variable, so it is in the form of &#8220;1\/0&#8221; (presence or absence of a specific phenomenon). That is, determine the probability that y would be 1. When multiple regression analysis (linear regression analysis) is used when the outcome variable is a discrete variable, there is nothing that cannot be analyzed, and I think that it will often output results that seem to be appropriate, but it is not always the correct result. So we should be careful.<\/p>\n\n\n\n<p>Now, let&#8217;s perform a logistic regression analysis using &#8220;Changes in purchase frequency of private brand in the last 1-2 years (increased\/decreased\/no change)&#8221; as outcome variables.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Procedure of \u201cLogistic Regression Analysis\u201d<\/strong><\/strong><\/h2>\n\n\n\n<p>Unlike multiple regression analysis (linear regression analysis), logistic regression analysis cannot be performed in Excel, so this time we will use the statistical tool &#8220;R (free tool)&#8221; that can also perform logistic regression analysis. R is also a very useful tool that can handle various data mining (statistical analysis) methods such as multiple regression analysis (linear regression analysis), decision tree analysis, and cluster analysis. If you google R itself, you will find many articles, so I won&#8217;t write about it here (Let me put just one link <a href=\"https:\/\/www.r-project.org\/\" target=\"_blank\" rel=\"noopener\" title=\"\">here<\/a>).<\/p>\n\n\n\n<p>There are various ways to use R, but this time, I would like to proceed with the method of moving R by using &#8220;<a href=\"https:\/\/colab.research.google.com\/?hl=en\" target=\"_blank\" rel=\"noopener\" title=\"\">Google Colaboratory<\/a> (Hereafter, &#8220;Colabo&#8221;. This is also free!)&#8221;. If you google about &#8220;Colabo&#8221; itself, you will find many articles, so I won&#8217;t write about it here (Let me put just one link <a href=\"https:\/\/www.geeksforgeeks.org\/how-to-use-google-colab\/\" target=\"_blank\" rel=\"noopener\" title=\"\">here<\/a>).<\/p>\n\n\n\n<p>After accessing Colabo, first go to \u201cFile\u201d -&gt; \u201cNew Notebook\u201d on the menu bar. &nbsp;*The screenshots are in Japanese, sorry.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"300\" height=\"289\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab1-300x289.png\" alt=\"Fig1. Create Notebook\" class=\"wp-image-2645\"\/><figcaption class=\"wp-element-caption\">Fig1. Create Notebook<\/figcaption><\/figure>\n\n\n\n<p>Then proceed with \u201cFile\u201d -&gt; \u201cDownload\u201d -&gt; \u201cDownload .ipynb\u201d on the menu bar.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"292\" height=\"300\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab2-292x300.png\" alt=\"Fig2. Download &quot;.ipynb&quot; file\" class=\"wp-image-2646\"\/><figcaption class=\"wp-element-caption\">Fig2. Download &#8220;.ipynb&#8221; file<\/figcaption><\/figure>\n\n\n\n<p>Open the downloaded \u201c.ipynb\u201d file on text editors like Notepad, etc.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"374\" height=\"487\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab3.png\" alt=\"Fig3. Edit &quot;.ipynb&quot; file\" class=\"wp-image-2647\"\/><figcaption class=\"wp-element-caption\">Fig3. Edit &#8220;.ipynb&#8221; file<\/figcaption><\/figure>\n\n\n\n<p>In the red squared part, update &#8220;name&#8221;: &#8220;python3&#8221;, &#8220;display_name&#8221;: &#8220;Python 3&#8221; as &#8220;name&#8221;: &#8220;ir&#8221;, &#8220;display_name&#8221;: &#8220;R&#8221; and save.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"342\" height=\"491\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab4.png\" alt=\"\u56f34. &quot;.ipynb&quot;\u30d5\u30a1\u30a4\u30eb\u306e\u7de8\u96c6\uff08\u7d9a\u304d\uff09\" class=\"wp-image-2648\"\/><figcaption class=\"wp-element-caption\">Fig4. Edit &#8220;.ipynb&#8221; file (continued) <\/figcaption><\/figure>\n\n\n\n<p>Go back to Colabo screen, and proceed with \u201cFile\u201d -&gt; \u201cUpload notebook\u201d, and choose the \u201c.ipynb\u201d file saved above.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"300\" height=\"183\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab5-300x183.png\" alt=\"Fig5. Upload Notebook\" class=\"wp-image-2649\"\/><figcaption class=\"wp-element-caption\">Fig5. Upload Notebook<\/figcaption><\/figure>\n\n\n\n<p>Select &#8220;Runtime&#8221; \u2192 &#8220;Change runtime type&#8221; on the menu bar.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"300\" height=\"214\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab6-300x214.png\" alt=\"\u56f36. \u30e9\u30f3\u30bf\u30a4\u30e0\u306e\u30bf\u30a4\u30d7\u5909\u66f4\" class=\"wp-image-2650\"\/><figcaption class=\"wp-element-caption\">Fig6. Change Runtime Type<\/figcaption><\/figure>\n\n\n\n<p>In the displayed dialog box, if &#8220;Runtime type&#8221; is &#8220;R&#8221;, it is OK. Now you are ready to run R on Colab.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"300\" height=\"122\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab7-300x122.png\" alt=\"Fig7. Confirm the Runtime Type change\" class=\"wp-image-2651\"\/><figcaption class=\"wp-element-caption\">Fig7. Confirm the Runtime Type change<\/figcaption><\/figure>\n\n\n\n<p>First, load the data file to be analyzed. Click the folder mark on the left side of the screen (callout 1 in Figure 8), then click the up arrow (callout 2 in Figure 8) to select the file.<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><img decoding=\"async\" width=\"300\" height=\"180\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/colab8-300x180.png\" alt=\"Fig8. Read data file into R\" class=\"wp-image-2652\"\/><figcaption class=\"wp-element-caption\">Fig8. Read data file into R<\/figcaption><\/figure>\n\n\n\n<p>Once the file is loaded, it&#8217;s time to run the logistic regression analysis.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"535\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/logi_test1-1024x535.png\" alt=\"Fig9. Run Logistic Regression analysis\" class=\"wp-image-2653\"\/><figcaption class=\"wp-element-caption\">Fig9. Run Logistic Regression analysis<\/figcaption><\/figure>\n\n\n\n<p>What I am doing in Figure 9 is as follows.<\/p>\n\n\n\n<p>1. Read the data (CSV file loaded onto R) into the variable &#8220;dat (variable name can be anything)&#8221;<\/p>\n\n\n\n<p>2. Specify the columns to be used for analysis from the data loaded in 1. <\/p>\n\n\n\n<p>3. Execution of logistic regression analysis (&#8220;glm&#8221; on the right is the logistic regression analysis command. Here, the execution result is read into the variable &#8220;ans&#8221;)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>How to understand the results<\/strong><\/strong><\/h2>\n\n\n\n<p>Now let&#8217;s view the analysis results.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"609\" height=\"748\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/logi_test2.png\" alt=\"Fig10. Analysis results - 1\" class=\"wp-image-2654\"\/><figcaption class=\"wp-element-caption\">Fig10. Analysis results &#8211; 1<\/figcaption><\/figure>\n\n\n\n<p>In the middle, there is a table labeled \u201cCoefficients:\u201d. From left to right, \u201dEstimate\u201d, \u201dStd.Error\u201d, \u201dz value\u201d, \u201dPr(&gt;|z|)\u201d. The \u201cEstimate\u201d in this table is the same as the \u201cCoefficient\u201d in the <a href=\"https:\/\/mondaikaiketsu.net\/en\/keypoints-regression-analysis\/\" target=\"_blank\" rel=\"noopener\" title=\"\">multiple linear regression analysis<\/a>, and \u201cPr(&gt;|z|)\u201d corresponds to the \u201cP value\u201d. The way of viewing these is the same as in multiple linear regression analysis, with P-values less than 5% and larger coefficients having a greater impact on the outcome variable. By looking at this result, the impact is larger in order of \u201cPrice (what to pay attention to when purchasing a private brand &#8211; [price])\u201d and \u201cFoods (the number of foods that have been switched to private brands within the past year)\u201d.<\/p>\n\n\n\n<p>Now for the logistic regression analysis, we need one more step. In case of multiple regression analysis, you create the formula Y=ax1+bx2+cx3\u2026+d. Since Y was the outcome variable, we were able to predict Y by changing the coefficients (a,b,c in the above formula).<\/p>\n\n\n\n<p>For logistic regression analysis, the outcome variable is a discrete variable and has the form 1\/0. So, we have to think about the percentage of the result would become 1. Expressed as a formula, it looks like this.<\/p>\n\n\n\n<p>log(p\/1-p) =ax1+bx2+cx3\u2026+d<\/p>\n\n\n\n<p>p is the percentage of becoming 1 and 1-p is the percentage of not becoming 1. This ratio is called the odds ratio. I&#8217;m sure many of you have heard of it (reference, link). Logistic regression analysis models the impact of each independent variable on this odds ratio.<\/p>\n\n\n\n<p>Now let&#8217;s use R to calculate the impact of each independent variable on this odds ratio.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"622\" height=\"695\" src=\"https:\/\/mondaikaiketsu.net\/wp-content\/uploads\/2023\/06\/logi_test4.png\" alt=\"\u56f311. \u5206\u6790\u7d50\u679c\uff08\u305d\u306e\uff12\uff09\" class=\"wp-image-2656\"\/><figcaption class=\"wp-element-caption\">Fig11. Analysis results &#8211; 2<\/figcaption><\/figure>\n\n\n\n<p>Exp is a command that outputs the odds ratio, and as its parameter, we put in the variable containing the results of the logistic regression analysis earlier. As before (see Figure 10 above), from the left, there are \u201dEstimate\u201d, \u201dStd.Error\u201d, \u201dz value\u201d, and \u201dPr(&gt;|z|)\u201d, but the numbers are different. The number in this \u201cEstimate\u201d column is the size of the impact on the odds ratio.<\/p>\n\n\n\n<p>As before, \u201cPrice (what to pay attention to when purchasing a private brand &#8211; [price])\u201d and \u201cFoods (the number of foods that have been switched to private brands within the past year)\u201d have the greatest impact in that order. However, this time we can use these numbers as they are. Price is 2.43.., so if this variable increases by 1, the percentage (odds ratio) of &#8220;whether or not the frequency of private brand purchases has increased&#8221; is 2.43 times, and the same for Foods is 2.29 times.<\/p>\n\n\n\n<p>That&#8217;s all for the logistic regression analysis. Although it requires one more task than linear multiple regression analysis, it can be performed with a free tool (Colabo &amp; R), and if you learn this, you will be able to handle both continuous and discrete variables as outcome variables. Please try to add to your repertoire!<\/p>\n\n\n\n<p>That\u2019s all for this time, and I would like to continue from the next time onwards. Thank you for reading until the end.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello everyone. Last time, I wrote about \u201cThorough utilization of data that is sleeping in your company\u201d as &#8220;Problem Solving Practice Edition&#038;#8 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":563,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/mondaikaiketsu.net\/?p=2657","footnotes":""},"categories":[34,37],"tags":[],"class_list":["post-2711","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-as-is-en","category-data-analysis-en","en-US"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/posts\/2711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/comments?post=2711"}],"version-history":[{"count":51,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/posts\/2711\/revisions"}],"predecessor-version":[{"id":2763,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/posts\/2711\/revisions\/2763"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/media\/563"}],"wp:attachment":[{"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/media?parent=2711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/categories?post=2711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mondaikaiketsu.net\/wp-json\/wp\/v2\/tags?post=2711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}