Ave-command i R statistics

Imaging you have some data on unemployment:

unemployment.rate <- c(0.01, 0.17, 0.19, NA, 0.21, 0.14, 0.02,NA, 0.26, 0.27, 0.21, 0.28, 0.23, 0.16, 0.1, NA, 0.23, 0.03, 0.11)
cntry <- c ("SE", "NO", "DK", "SE", "NO", "SE", "DK", "DK", "NO", "DK", "SE", "DK", "DK", "SE", "DK", "SE", "SE", "DK", "NO")
size <- c("Big","Medium","Big","Big","Medium","Small","Big","Medium","Medium","Big","Small","Medium","Medium","Big","Medium","Big","Big","Big","Small")
df <- data.frame(unemployment_rate, cntry, size)

In the data we have some NA. We want to replace these based on the mean of cntry. This can be done using the ave-command (from R-base):

df$unemployment_rate <- ave(df$unemployment.rate, list(df$cntry), FUN=function(x) {x[is.na(x)] <- mean(x,na.rm=TRUE); x; })

We can actually base the new data on both the mean fron cntry and the mean from size.

df$unemployment_rate <- ave(df$unemployment.rate, list(df$cntry, df$size), FUN=function(x) {x[is.na(x)] <- mean(x,na.rm=TRUE); x; })

Amazing! (edited)

Save plots using ggplot2

This is how you save plots to hard drive when using gglpot. Uisng 300 dpi as resolution should be enough for publication. You have to give the height and width (five is good but use what fits you). You don’t need to scale it down to 0.8.

ggsave("~/gitlab/moralisk.org/p1.png", dpi = 300, scale = 0.8, height = 5, width = 5)

Nextcloud on CentOS — host your own cloud

A lot of use cloud services. Cloud services can be used to sync between devices, but is also good for cooperation with colleagues. Most of us use clouds own by others, such as Google drive, Box or Dropbox. I for myself prefer to handle the cloud myself. In my opinion it should be easy to set up a server on an old computer and run your own cloud. I do that, I use a server program called Nextcloud, which I like very much. But I must say, as an amateur, it has not always been easy. There are manuals, one of the best of course is Nextclouds own. But the problem is the manuals are not always easy and that Nextcloud by it self is not enough. You also need to set up apache, mysql/mariadb and php (also called LAMP — Linux, Apache, Mysql, PHP). If you use an OS using SELinux you also need to handle this. You should also, for security, fix SSL-certificate. Unfortunately, I have not found a tutorial with all these collected together. Quite often tutorials on the internet require that you understand for example Apache.

In this article I will show you how to set up Nextcloud on CentOS 7. The Nextcloud version used is 16. Things may change between versions. Further, this is in no way a complete tutorial. But it will make you up an going. I would also recommend anyone finding this to not only using this article, but also look at Nextclouds manual for administrators.

CentOS

The first thing you need is of course a computer that can be used as a server. An old laptop will most likely be enough, but you can also build one if you want to. Then you have to install CentOS on that computer. This article is not about installing CentOS — but it is really easy. You can download CentOS here. Then you burn it on a USB-stick. There are several ways to do this, but if you use a Linux distro with gnome you can use the disk application (use the restore image function).

All the commands here require that you are root. To become root you open the terminal and write

su

Enter your password and you should be root.

The first thing you need to do after installing CentOS is to update the system. CentOS use yum as package manager. If you use the -y flag you don’t need to confirm the update. If you want to confirm just skip the -y flag.

yum -y update

To be able to install LAMP you need the yum-utils packages (see more about it here). For me the yum-utils packages was already installed on the system. But to be on the safe side try to install the packages:
Install the ‘yum-utils’ package on your server to use ‘yum-config-manager’.

sudo yum -y install yum-utils

In this article I will show how to use wget to fetch Nextcloud from internet. Wget is not installed on the system, so you have to install it. I also installed Nano, a text editor. But if you know how to use vi and feel comfortable with that text editor, you of course do not need install nano.

yum -y install nano wget

MariaDB

I use mariadb instead of MySQL. MariaDB is a community developed fork of MySQL. But you can use either one. The commands used here are the same. The first thing we need to do is to install MariaDB.

yum -y install mariadb mariadb-server

After installing MariaDB you have to start it and enable it. MariaDB works as a service, and to start means just that. To enable MariaDB ensures that the service starts when the computer starts. This is handled by systemd.

systemctl start mariadb.service
systemctl enable mariadb.service

To make settings and secure the mysql-data base you need to run the mysql-secure-installation. In the terminal run:

mysql_secure_installation

… use the following set up:

Enter current password for root (enter for none): ENTER
Set root password? [Y/n] Y
Remove anonymous users? [Y/n] Y
Disallow root login remotely? [Y/n] Y
Remove test database and access to it? [Y/n] Y
Reload privilege tables now? [Y/n] Y

After you have done this you can log into MariaDB and create database and user:

mysql -u root -p

You will be prompted to enter your password (which you create when running mysqlsecureinstallation)

In MariaDB you create database and user. Change user to whatever (admin for example):

MariaDB [(none)]> CREATE DATABASE nextcloud;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nextcloud.* TO 'nextclouduser'@'localhost' IDENTIFIED BY 'YOURPASSWORD';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> \q

Apache server

Next you have to install the Apache Web Server. In the terminal:

yum install httpd -y

You have to configure a virtual host. Create a new file (here using nano) with the following set up. In the terminal:

nano /etc/httpd/conf.d/nextcloud.conf

The following set up works fine.

<VirtualHost *:80>
  DocumentRoot /var/www/html/
  ServerName  your.server.com

<Directory "/var/www/html/">
  Require all granted
  AllowOverride All
  Options FollowSymLinks MultiViews
</Directory>
</VirtualHost>

Start and enable Apache.

systemctl start httpd.service
systemctl enable httpd.service

PHP

Install PHP 7 — be shure to install at least php7.3. To do that you need to install the epel-release repository.

yum install epel-release

Install Remi and EPEL repository packages:

rpm -Uvh http://rpms.remirepo.net/enterprise/remi-release-7.rpm
rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Enable Remi PHP 7 repo:

yum-config-manager --enable remi-php73

Install PHP and several PHP modules required by Nextcloud by executing the following command:

yum -y install php php73-php-fpm php-mysql php-pecl-zip php-xml php-mbstring php-gd

It is a good practice to check what version is running:

php -v

Next, open the PHP configuration file and increase the upload file size. You can find the location of the PHP configuration file by executing the following command:

php --ini |grep Loaded

This is what i saw form the command:

Loaded Configuration File:         /etc/php.ini

Thus, we have to make changes to the /etc/php.ini file. Here we increase the default upload limit to 100 MB. You can set the values according to your needs. Run the following commands:

sed -i "s/post_max_size = 8M/post_max_size = 100M/" /etc/php.ini
sed -i "s/upload_max_filesize = 2M/upload_max_filesize = 100M/" /etc/php.ini

Create a file called info.php within the var/www/html directory with the following content:

<?php phpinfo(); ?>

Restart the web server:

systemctl restart httpd

Install Nextcloud 16

Go to Nextcloud’s official website and download the latest stable release of the application

wget https://download.nextcloud.com/server/releases/nextcloud-16.0.3.zip

Unpack the downloaded zip archive to the document root directory on your server

unzip nextcloud-16.0.3.zip -d /var/www/html/

Create data in the nextcloud directory

mkdir /var/www/html/nextcloud/data

Set the Apache user to be owner of the Nextcloud files

chown -R apache:apache /var/www/html/*

SELINUX

Settings in selinux

chcon -t httpd_sys_rw_content_t /var/www/html/nextcloud/ -R

Selinux can be totally disabled — but it is not necessary. To disable selinux change enabled to disabled in /etc/selinux/config.

nano /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing - SELinux security policy is enforced.
#       permissive - SELinux prints warnings instead of enforcing.
#       disabled - No SELinux policy is loaded.
*SELINUX=disabled*
# SELINUXTYPE= can take one of these two values:
#       targeted - Targeted processes are protected,
#       mls - Multi Level Security protection.
SELINUXTYPE=targeted

Access Nextcloud

Finally, access Nextcloud at http://yourip/nextcloud (to see your ip-adress run ip addr in the terminal). The installation wizard will check if all requirements and if everything is OK, you will be prompted to create your admin user and select storage and database. Select MySQL/MariaDB as database and enter the details for the database we created earlier in this post:

User: nextclouduser
Database password: YOURPASSWORD
Database name: nextcloud
host: localhost

Family Orientation in eight countries — a moment with R

In the last post I investigated the development of individualism in several countries, with the aim to investigate if individualism is something recent in Sweden. I used an indicator which I am rather sceptical about — the relation between importance of friends and family. The more important friends are related to the family the more individualistic. This is an indicator used, among others, to measure individualism. Maybe it is a good one, but to use it as standalone I think the variable rather measure family orientation — which of course can be related to individualism. But it is not the same thing.

Anyway, i look through and made the code I used more efficient, and then I found another pattern which I found really interesting. Family orientation is rather stable in Sweden but actually decreasing in most countries. The USA is an exception. However, USA is not alone. The other Anglo-Saxon show the same pattern. Another pattern is that family orientation in the east European countries have decreased rather fast.

First I made the code shorter and more efficient. As in the last post I will only show the codes for one of the waves — the codes for the other waves are about the same. There is thus no reason to show the codes for the other waves. As in the former post data comes from World Value Survey.

I use the countrycode package to transform the numeric country codes into factors.

library(countrycode)
wd6$cntry <- countrycode (wd6$v2, origin = "wvs", destination = "un.name.en")

Individualism/family orientation is created from two variables measuring the importance of the family and friends. I first revert the scale, and then just take the importance of friends minus the importance of family. The higher the value the more individualistic — and the lesser is the family orientation.

wd6$family <- ifelse (wd6$v4 < 0, NA, wd6$v4)
wd6$friends <- ifelse (wd6$v5 < 0, NA, wd6$v5)

wd6$family <- (wd6$family - (max(wd6$family, na.rm = TRUE))) *-1
wd6$friends <- (wd6$friends - (max(wd6$friends, na.rm = TRUE))) *-1

wd6$individualism <- wd6$friends - wd6$family

The next step is to select the countries to use. I use the car-package to create a variable from which to select a subset.

library(car)

wd6$cntry_select <- recode (wd6$cntry, '
"Australia" = 1;
"Estonia" = 1;
"Germany" = 1;
"Netherlands" = 1;
"New Zealand" = 1;
"Poland" = 1;
"Romania" = 1;
"Spain" = 1;
"Sweden" = 1;
"United States of America" = 1;
else = 0')

wd6 <- subset (wd6, cntry_select==1)

The next step is to create a data.frame with the mean value of individualism/family orientation in each country. To do this I use the aggregate command. The new data frame is need of some cleaning.

t6b <- aggregate (wd6$individualism, list(wd6$cntry), FUN = mean, na.rm=TRUE)
t6b$Country <- t6b$Group.1
t6b$Individualism <- t6b$x
t6b <- t6b[c(-1,-2)]

This can be plotted using ggplot2.

p6 <- ggplot (data=t6b, aes(x= reorder (Country, -Individualism), y=Individualism)) +
    geom_bar (stat="identity", position=position_dodge()) +
    labs (x="Country", y="Individualism") +
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

p6

After doing this for all the waves I will end up with five data frames, which are named: tb2, tb3, tb4, tb5, and tb6. Since the second wave does not include Sweden it is excluded. In each of these data frames I create a variable named Wave. The I bind them together.

t2b$Wave <- 2
t3b$Wave <- 3
t4b$Wave <- 4
t5b$Wave <- 5
t6b$Wave <- 6

df7 <- rbind (t3b,t4b,t5b, t6b)

To make visualisation easier to see I choose eight countries:

df7$country_select <- recode (df7$Country, '
c("Sweden", "United States of America", "Spain", "Poland", "Estonia", "Germany", "Australia", "New Zealand") = 1;
else=0')

df7 <- subset(df7, country_select == 1)

I then use ggplot to produce the graph.

p7 <- ggplot (data=df7, aes(x=Wave, y=Individualism, group=Country)) +
    geom_line (aes(color=Country), size=1.2) +
    geom_point(aes(color=Country), size = 3.1) + 
    scale_color_brewer(palette="Dark2")

As can be seen individualism/family orientation is rather stable in Sweden, while decreasing in the USA (as we saw in the previous post). More interesting though is that individualism is not only decreasing in the USA, but also in Australia and New Zealand — two Anglo-Saxon countries. Other interesting results are the increase in individualism in the eastern European countries. The increase is Very strong in Poland and Estonia. But as can be seen individualism is also increasing in Germany and Spain. Even though we have to be cautious with conclusions it seems as the Anglo-Saxon countries and Sweden are outliers. Everywhere else is individualism on the increase. Furthermore, maybe this is not about primary about individualism — but about family orientation. Even though probably related, it is not the same thing.

The development of individualism — a moment with R

I recently read a book about the education system in Sweden (”Glädjeparadoxen” [The paradox of Happiness]). The book is indeed interesting, dealing with the question why Swedish pupils has fallen behind in the big international tests such as PISA. However, it was another thing I found interesting. It is well known that the Swedish society is more individualistic than most other countries (however, I have some objection to what is usually meant by individualism in general — I maybe write about that in another post). But in the book the authors claim values in Sweden was more collective than in most other countries in the beginning of the 1980’s, and that it is only lately that Sweden has become more individualistic (with reference to Santos et.al (2017)). In this post I will make a brief test whit data from World Value Survey (WVS). The measure I will use is difference in how important friends are relative the family. According to Santos et.al (2017) this is a well known indicator for individualism (I am somewhat sceptical though — but maybe more about this in another post). Of course to be able to really answer the question we would most likely need more variables. But this is just a small test.

Data

The data used in the study was World Value Survey (WVS), which can be found here. Sweden is not part of all the waves, and the questions we need is only included in later data sets. I will therefore use four waves (wave 3, 4 5, and 6). In years we will be measuring changes in individualism from 1990 to 2014. This is not bad at all even though we do not go back the early 1980’s. On the other hand — how likely is it that all changes from the early 1980’s until today happened during the 1980’s?

The code

After reading the code into R I did the following with the waves (here only one of the wave (wave 6) is shown).

Frist I made all letters to lower case:

names (wd6) <- tolower (names(wd6))

Then I transformed the country codes into the names of the countries using the countrycoce package.

library(countrycode)
wd6$cntry <- countrycode (wd6$v2, origin = "wvs", destination = "un.name.en")

Next step is to construct the variable measuring individualism — which is the importance of friends minus the importance of the family. The higher the numbers the more individualized.

wd6$family <- ifelse (wd6$v4 < 0, NA, wd6$v4)
wd6$friends <- ifelse (wd6$v5 < 0, NA, wd6$v5)

wd6$family <- (wd6$family - (max(wd6$family, na.rm = TRUE))) *-1
wd6$friends <- (wd6$friends - (max(wd6$friends, na.rm = TRUE))) *-1

wd6$individualism <- wd6$friends - wd6$family

Now I have the two variables I need. But to make the results easier on the eyes I only selected some of the countries — using the subset-command:

wd6$cntry_select <- recode (wd6$cntry, '
"Australia" = 1;
"Estonia" = 1;
"Germany" = 1;
"Netherlands" = 1;
"New Zealand" = 1;
"Poland" = 1;
"Romania" = 1;
"Spain" = 1;
"Sweden" = 1;
"United States of America" = 1;
else = 0')

wd6 <- subset (wd6, cntry_select==1)

Now I have to construct a new data fram with the data to use to make graphs. To do this I first calculated the mean value of the variable individualism in each country by using tapply

t6 <- tapply (wd6$individualism, list(wd6$cntry), FUN = mean, na.rm=TRUE)
round (cbind (t6), 3)

From the result I created two variables: country and individualism — and put them in a new data frame.

country <- c (
"Australia",
"Estonia",                 
"Germany",
"Netherlands",             
"New Zealand",             
"Poland",
"Romania",
"Spain",
"Sweden",
"United States of America")

individualism <- c(
-0.397,
-0.463,
-0.308,
-0.375,
-0.403,
-0.602,
-0.996,
-0.423,
-0.197,
-0.403)

df6 <- data.frame (country, individualism)

The last step is to produce the graph. This is done with the ggplot2-package.

library (ggplot2)

p6 <- ggplot (data=df, aes(x= reorder (country, -individualism), y=individualism)) +
    geom_bar (stat="identity", position=position_dodge()) +
    labs (x="Country", y="Individualism") +
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

The results from the waves can be seen below.

Results/Figures

Figure 1: Wave 2: 1990-94

As can be see Sweden was not part of the second wave (1990-1994). Somewhat suprinsing the level of individualism was greatest in Turkey — greater than in countries such as Spain and Brazil. But to be fair — none of the countries presented in figure 1 are known to have high levels of individualism (as far as I know at least).

Figure 2: Wave 3: 1995-1998

Figure 3: Wave 4: 1999-2002

Figure 4: wave 5: 2005-2006

Figure 5: wave 6: 2010-2014

From Figure 2 to 5 Sweden is included in the Figures. As can be seen Sweden has the highest level of individualism among the included countries in all of the waves. For Sweden the results seems to be stable, while for example USA seems to be falling behind. Let’s look closer into this. In the graph below the development of individualism is investigated in Sweden, USA, Germany and Spain.

To do this I created a data set with the values for the four countries, looking like this:

Country Individualism Wave
1 Sweden -0.207 Wave 1
2 Sweden -0.187 Wave 2
3 Sweden -0.219 Wave 3
4 Sweden -0.197 Wave 4
5 USA -0.291 Wave 1
6 USA -0.343 Wave 2
7 USA -0.387 Wave 3
8 USA -0.403 Wave 4
9 Spain -0.508 Wave 1
10 Spain -0.467 Wave 2
11 Spain -0.450 Wave 3
12 Spain -0.423 Wave 4
13 Germany -0.381 Wave 1
14 Germany NA Wave 2
15 Germany -0.307 Wave 3
16 Germany -0.308 Wave 4

Once again using ggplot2:

p7 <- ggplot (data=df7, aes(x=Wave, y=Individualism, group=Country)) +
geom_line (aes(color=Country), size=1.2) +
geom_point(aes(color=Country), size = 3.1) + 
scale_color_brewer(palette="Dark2")

And the result…

Figure 6: Development of individualism in Sweden, USA, Germnay and Spain

As can be seen in Figure 6 the level of individualism is rather stable in both Sweden and Spain, however on very different levels. Individualism is super stable in Germany. In the USA on the other hand individualism seems to be decreasing substantially. From the results here the claim that individualism is a rather late thing in Sweden does not get support. However, more research needs to be done of course!

Literature

Heller-Sahlgren, G., & Sanandaji, N. (2019). Glädjeparadoxen: Historien om skolans uppgång, fall och möjliga upprättelse. Stockholm: Dialogos förlag.

Santos, H. C., Varnum, M. E. W., & Grossmann, I. (2017). Global Increases in Individualism. Psychological Science, 28(9), 1228–1239. https://doi.org/10.1177/0956797617700622

New film on SPSS

I usually don’t use SPSS. My main program for doing statistics is R. But when teaching I sometimes need to use SPSS. SPSS can be quite hard to use since there are so many options, and you need to use i a lot be familiar with the program. I there made a video showing various kind of things that can be done in SPSS.

Emacs — how I use it

I use Emacs — every day. Emacs is a text editor that is complex, in fact so complex that it is more of an environment. It is in general an environment for programming. So why do I use Emacs? I am not a programmer, I am a social scientist. Well, I originally used Emacs as an environment to program i R, and doing statistics. I rather soon found out that it can be configured. I started to use org-mode as a note taker. Later on I started to use org-mode as a calendar and as planning tool for my work.

Somewhat later I started to use Emacs when writing in latex, and why not markdown (which I use rather seldom since org-mode is so much better). I discovered mu4e, which handles e-mails inside Emacs. By the way, I use Emacs as a file manager. In fact — I use Emacs for almost everything I do on my computer, except web browsing. I even use Emacs to do my presentations, via ox-reveal.

For me Emacs is super effective, not just to do statistics in R, but to handle e-mails and organizing my work in general. Would I recommend Emacs for other social scientist? Probably not. Not because it is bad, but because I seldom recommend programs — but also that I don’t think the kind of work flow I have suits everyone. But I don’t know, so I have decided to write about my set up and what Emacs allows me to do. If you are impatient you can check out my config file here.

How to get R up and running in Fedora Linux

It is trivial to install R in Fedora. Just type:

sudo dnf install R

After that you can run R in the terminal. If you want an environment to work in you can use RStudio. I use Emacs and ESS. Everything works nice up until the point you want to install a package — say for example car. It wont work (out of the box). You need to install an additional program. In the terminal:

sudo dnf install libcurl-devel 

libcurl is a client-side URL transfer library that you need to install packages in R.

It is also a good idea to install:

sudo dnf install NLopt

NLopt is a library for nonlinear optimization, callable from R.

A nice theme for the desktop?

A nice theme for the desktop is not super important. OK, it has to look decent, but on my main computer I hardly see the desktop theming, or the icons. I just see the application I use. When the main application is Emacs, and the second a web browser, themes of the desktop does not matter much. However I found this, and I must say it looks really good. I checked the config files, and saw that the owner of the destop has developed the theme used. So I downloaded it on one of my old computer, which has xfce4 as desktop environment.

Here you can see the result — pretty nice.