Merge remote-tracking branch 'LCTT/master'

This commit is contained in:
Xingyu.Wang 2019-05-04 13:15:47 +08:00
commit 439d1c723e
5 changed files with 865 additions and 826 deletions

View File

@ -1,531 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (FSSlc)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Inter-process communication in Linux: Using pipes and message queues)
[#]: via: (https://opensource.com/article/19/4/interprocess-communication-linux-channels)
[#]: author: (Marty Kalin https://opensource.com/users/mkalindepauledu)
Inter-process communication in Linux: Using pipes and message queues
======
Learn how processes synchronize with each other in Linux.
![Chat bubbles][1]
This is the second article in a series about [interprocess communication][2] (IPC) in Linux. The [first article][3] focused on IPC through shared storage: shared files and shared memory segments. This article turns to pipes, which are channels that connect processes for communication. A channel has a _write end_ for writing bytes, and a _read end_ for reading these bytes in FIFO (first in, first out) order. In typical use, one process writes to the channel, and a different process reads from this same channel. The bytes themselves might represent anything: numbers, employee records, digital movies, and so on.
Pipes come in two flavors, named and unnamed, and can be used either interactively from the command line or within programs; examples are forthcoming. This article also looks at memory queues, which have fallen out of fashion—but undeservedly so.
The code examples in the first article acknowledged the threat of race conditions (either file-based or memory-based) in IPC that uses shared storage. The question naturally arises about safe concurrency for the channel-based IPC, which will be covered in this article. The code examples for pipes and memory queues use APIs with the POSIX stamp of approval, and a core goal of the POSIX standards is thread-safety.
Consider the [man pages for the **mq_open**][4] function, which belongs to the memory queue API. These pages include a section on [Attributes][5] with this small table:
Interface | Attribute | Value
---|---|---
mq_open() | Thread safety | MT-Safe
The value **MT-Safe** (with **MT** for multi-threaded) means that the **mq_open** function is thread-safe, which in turn implies process-safe: A process executes in precisely the sense that one of its threads executes, and if a race condition cannot arise among threads in the _same_ process, such a condition cannot arise among threads in different processes. The **MT-Safe** attribute assures that a race condition does not arise in invocations of **mq_open**. In general, channel-based IPC is concurrent-safe, although a cautionary note is raised in the examples that follow.
### Unnamed pipes
Let's start with a contrived command line example that shows how unnamed pipes work. On all modern systems, the vertical bar **|** represents an unnamed pipe at the command line. Assume **%** is the command line prompt, and consider this command:
```
`% sleep 5 | echo "Hello, world!" ## writer to the left of |, reader to the right`
```
The _sleep_ and _echo_ utilities execute as separate processes, and the unnamed pipe allows them to communicate. However, the example is contrived in that no communication occurs. The greeting _Hello, world!_ appears on the screen; then, after about five seconds, the command line prompt returns, indicating that both the _sleep_ and _echo_ processes have exited. What's going on?
In the vertical-bar syntax from the command line, the process to the left ( _sleep_ ) is the writer, and the process to the right ( _echo_ ) is the reader. By default, the reader blocks until there are bytes to read from the channel, and the writer—after writing its bytes—finishes up by sending an end-of-stream marker. (Even if the writer terminates prematurely, an end-of-stream marker is sent to the reader.) The unnamed pipe persists until both the writer and the reader terminate.
In the contrived example, the _sleep_ process does not write any bytes to the channel but does terminate after about five seconds, which sends an end-of-stream marker to the channel. In the meantime, the _echo_ process immediately writes the greeting to the standard output (the screen) because this process does not read any bytes from the channel, so it does no waiting. Once the _sleep_ and _echo_ processes terminate, the unnamed pipe—not used at all for communication—goes away and the command line prompt returns.
Here is a more useful example using two unnamed pipes. Suppose that the file _test.dat_ looks like this:
```
this
is
the
way
the
world
ends
```
The command:
```
`% cat test.dat | sort | uniq`
```
pipes the output from the _cat_ (concatenate) process into the _sort_ process to produce sorted output, and then pipes the sorted output into the _uniq_ process to eliminate duplicate records (in this case, the two occurrences of **the** reduce to one):
```
ends
is
the
this
way
world
```
The scene now is set for a program with two processes that communicate through an unnamed pipe.
#### Example 1. Two processes communicating through an unnamed pipe.
```
#include <sys/wait.h> /* wait */
#include <stdio.h>
#include <stdlib.h> /* exit functions */
#include <unistd.h> /* read, write, pipe, _exit */
#include <string.h>
#define ReadEnd 0
#define WriteEnd 1
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /** failure **/
}
int main() {
int pipeFDs[2]; /* two file descriptors */
char buf; /* 1-byte buffer */
const char* msg = "Nature's first green is gold\n"; /* bytes to write */
if (pipe(pipeFDs) < 0) report_and_exit("pipeFD");
pid_t cpid = fork(); /* fork a child process */
if (cpid < 0) report_and_exit("fork"); /* check for failure */
if (0 == cpid) { /*** child ***/ /* child process */
close(pipeFDs[WriteEnd]); /* child reads, doesn't write */
while (read(pipeFDs[ReadEnd], &buf, 1) > 0) /* read until end of byte stream */
write(STDOUT_FILENO, &buf, sizeof(buf)); /* echo to the standard output */
close(pipeFDs[ReadEnd]); /* close the ReadEnd: all done */
_exit(0); /* exit and notify parent at once */
}
else { /*** parent ***/
close(pipeFDs[ReadEnd]); /* parent writes, doesn't read */
write(pipeFDs[WriteEnd], msg, [strlen][8](msg)); /* write the bytes to the pipe */
close(pipeFDs[WriteEnd]); /* done writing: generate eof */
wait(NULL); /* wait for child to exit */
[exit][7](0); /* exit normally */
}
return 0;
}
```
The _pipeUN_ program above uses the system function **fork** to create a process. Although the program has but a single source file, multi-processing occurs during (successful) execution. Here are the particulars in a quick review of how the library function **fork** works:
* The **fork** function, called in the _parent_ process, returns **-1** to the parent in case of failure. In the _pipeUN_ example, the call is: [code]`pid_t cpid = fork(); /* called in parent */`[/code] The returned value is stored, in this example, in the variable **cpid** of integer type **pid_t**. (Every process has its own _process ID_ , a non-negative integer that identifies the process.) Forking a new process could fail for several reasons, including a full _process table_ , a structure that the system maintains to track processes. Zombie processes, clarified shortly, can cause a process table to fill if these are not harvested.
* If the **fork** call succeeds, it thereby spawns (creates) a new child process, returning one value to the parent but a different value to the child. Both the parent and the child process execute the _same_ code that follows the call to **fork**. (The child inherits copies of all the variables declared so far in the parent.) In particular, a successful call to **fork** returns:
* Zero to the child process
* The child's process ID to the parent
* An _if/else_ or equivalent construct typically is used after a successful **fork** call to segregate code meant for the parent from code meant for the child. In this example, the construct is: [code] if (0 == cpid) { /*** child ***/
...
}
else { /*** parent ***/
...
}
```
If forking a child succeeds, the _pipeUN_ program proceeds as follows. There is an integer array:
```
`int pipeFDs[2]; /* two file descriptors */`
```
to hold two file descriptors, one for writing to the pipe and another for reading from the pipe. (The array element **pipeFDs[0]** is the file descriptor for the read end, and the array element **pipeFDs[1]** is the file descriptor for the write end.) A successful call to the system **pipe** function, made immediately before the call to **fork** , populates the array with the two file descriptors:
```
`if (pipe(pipeFDs) < 0) report_and_exit("pipeFD");`
```
The parent and the child now have copies of both file descriptors, but the _separation of concerns_ pattern means that each process requires exactly one of the descriptors. In this example, the parent does the writing and the child does the reading, although the roles could be reversed. The first statement in the child _if_ -clause code, therefore, closes the pipe's write end:
```
`close(pipeFDs[WriteEnd]); /* called in child code */`
```
and the first statement in the parent _else_ -clause code closes the pipe's read end:
```
`close(pipeFDs[ReadEnd]); /* called in parent code */`
```
The parent then writes some bytes (ASCII codes) to the unnamed pipe, and the child reads these and echoes them to the standard output.
One more aspect of the program needs clarification: the call to the **wait** function in the parent code. Once spawned, a child process is largely independent of its parent, as even the short _pipeUN_ program illustrates. The child can execute arbitrary code that may have nothing to do with the parent. However, the system does notify the parent through a signal—if and when the child terminates.
What if the parent terminates before the child? In this case, unless precautions are taken, the child becomes and remains a _zombie_ process with an entry in the process table. The precautions are of two broad types. One precaution is to have the parent notify the system that the parent has no interest in the child's termination:
```
`signal(SIGCHLD, SIG_IGN); /* in parent: ignore notification */`
```
A second approach is to have the parent execute a **wait** on the child's termination, thereby ensuring that the parent outlives the child. This second approach is used in the _pipeUN_ program, where the parent code has this call:
```
`wait(NULL); /* called in parent */`
```
This call to **wait** means _wait until the termination of any child occurs_ , and in the _pipeUN_ program, there is only one child process. (The **NULL** argument could be replaced with the address of an integer variable to hold the child's exit status.) There is a more flexible **waitpid** function for fine-grained control, e.g., for specifying a particular child process among several.
The _pipeUN_ program takes another precaution. When the parent is done waiting, the parent terminates with the call to the regular **exit** function. By contrast, the child terminates with a call to the **_exit** variant, which fast-tracks notification of termination. In effect, the child is telling the system to notify the parent ASAP that the child has terminated.
If two processes write to the same unnamed pipe, can the bytes be interleaved? For example, if process P1 writes:
```
`foo bar`
```
to a pipe and process P2 concurrently writes:
```
`baz baz`
```
to the same pipe, it seems that the pipe contents might be something arbitrary, such as:
```
`baz foo baz bar`
```
The POSIX standard ensures that writes are not interleaved so long as no write exceeds **PIPE_BUF** bytes. On Linux systems, **PIPE_BUF** is 4,096 bytes in size. My preference with pipes is to have a single writer and a single reader, thereby sidestepping the issue.
## Named pipes
An unnamed pipe has no backing file: the system maintains an in-memory buffer to transfer bytes from the writer to the reader. Once the writer and reader terminate, the buffer is reclaimed, so the unnamed pipe goes away. By contrast, a named pipe has a backing file and a distinct API.
Let's look at another command line example to get the gist of named pipes. Here are the steps:
* Open two terminals. The working directory should be the same for both.
* In one of the terminals, enter these two commands (the prompt again is **%** , and my comments start with **##** ): [code] % mkfifo tester ## creates a backing file named tester
% cat tester ## type the pipe's contents to stdout [/code] At the beginning, nothing should appear in the terminal because nothing has been written yet to the named pipe.
* In the second terminal, enter the command: [code] % cat > tester ## redirect keyboard input to the pipe
hello, world! ## then hit Return key
bye, bye ## ditto
<Control-C> ## terminate session with a Control-C [/code] Whatever is typed into this terminal is echoed in the other. Once **Ctrl+C** is entered, the regular command line prompt returns in both terminals: the pipe has been closed.
* Clean up by removing the file that implements the named pipe: [code]`% unlink tester`
```
As the utility's name _mkfifo_ implies, a named pipe also is called a FIFO because the first byte in is the first byte out, and so on. There is a library function named **mkfifo** that creates a named pipe in programs and is used in the next example, which consists of two processes: one writes to the named pipe and the other reads from this pipe.
#### Example 2. The _fifoWriter_ program
```
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#define MaxLoops 12000 /* outer loop */
#define ChunkSize 16 /* how many written at a time */
#define IntsPerChunk 4 /* four 4-byte ints per chunk */
#define MaxZs 250 /* max microseconds to sleep */
int main() {
const char* pipeName = "./fifoChannel";
mkfifo(pipeName, 0666); /* read/write for user/group/others */
int fd = open(pipeName, O_CREAT | O_WRONLY); /* open as write-only */
if (fd < 0) return -1; /* can't go on */
int i;
for (i = 0; i < MaxLoops; i++) { /* write MaxWrites times */
int j;
for (j = 0; j < ChunkSize; j++) { /* each time, write ChunkSize bytes */
int k;
int chunk[IntsPerChunk];
for (k = 0; k < IntsPerChunk; k++)
chunk[k] = [rand][9]();
write(fd, chunk, sizeof(chunk));
}
usleep(([rand][9]() % MaxZs) + 1); /* pause a bit for realism */
}
close(fd); /* close pipe: generates an end-of-stream marker */
unlink(pipeName); /* unlink from the implementing file */
[printf][10]("%i ints sent to the pipe.\n", MaxLoops * ChunkSize * IntsPerChunk);
return 0;
}
```
The _fifoWriter_ program above can be summarized as follows:
* The program creates a named pipe for writing: [code] mkfifo(pipeName, 0666); /* read/write perms for user/group/others */
int fd = open(pipeName, O_CREAT | O_WRONLY); [/code] where **pipeName** is the name of the backing file passed to **mkfifo** as the first argument. The named pipe then is opened with the by-now familiar call to the **open** function, which returns a file descriptor.
* For a touch of realism, the _fifoWriter_ does not write all the data at once, but instead writes a chunk, sleeps a random number of microseconds, and so on. In total, 768,000 4-byte integer values are written to the named pipe.
* After closing the named pipe, the _fifoWriter_ also unlinks the file: [code] close(fd); /* close pipe: generates end-of-stream marker */
unlink(pipeName); /* unlink from the implementing file */ [/code] The system reclaims the backing file once every process connected to the pipe has performed the unlink operation. In this example, there are only two such processes: the _fifoWriter_ and the _fifoReader_ , both of which do an _unlink_ operation.
The two programs should be executed in different terminals with the same working directory. However, the _fifoWriter_ should be started before the _fifoReader_ , as the former creates the pipe. The _fifoReader_ then accesses the already created named pipe.
#### Example 3. The _fifoReader_ program
```
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
unsigned is_prime(unsigned n) { /* not pretty, but efficient */
if (n <= 3) return n > 1;
if (0 == (n % 2) || 0 == (n % 3)) return 0;
unsigned i;
for (i = 5; (i * i) <= n; i += 6)
if (0 == (n % i) || 0 == (n % (i + 2))) return 0;
return 1; /* found a prime! */
}
int main() {
const char* file = "./fifoChannel";
int fd = open(file, O_RDONLY);
if (fd < 0) return -1; /* no point in continuing */
unsigned count = 0, total = 0, primes_count = 0;
while (1) {
int next;
int i;
ssize_t count = read(fd, &next, sizeof(int));
if (0 == count) break; /* end of stream */
else if (count == sizeof(int)) { /* read a 4-byte int value */
total++;
if (is_prime(next)) primes_count++;
}
}
close(fd); /* close pipe from read end */
unlink(file); /* unlink from the underlying file */
[printf][10]("Received ints: %u, primes: %u\n", total, primes_count);
return 0;
}
```
The _fifoReader_ program above can be summarized as follows:
* Because the _fifoWriter_ creates the named pipe, the _fifoReader_ needs only the standard call **open** to access the pipe through the backing file: [code] const char* file = "./fifoChannel";
int fd = open(file, O_RDONLY); [/code] The file opens as read-only.
* The program then goes into a potentially infinite loop, trying to read a 4-byte chunk on each iteration. The **read** call: [code]`ssize_t count = read(fd, &next, sizeof(int));`[/code] returns 0 to indicate end-of-stream, in which case the _fifoReader_ breaks out of the loop, closes the named pipe, and unlinks the backing file before terminating.
* After reading a 4-byte integer, the _fifoReader_ checks whether the number is a prime. This represents the business logic that a production-grade reader might perform on the received bytes. On a sample run, there were 37,682 primes among the 768,000 integers received.
On repeated sample runs, the _fifoReader_ successfully read all of the bytes that the _fifoWriter_ wrote. This is not surprising. The two processes execute on the same host, taking network issues out of the equation. Named pipes are a highly reliable and efficient IPC mechanism and, therefore, in wide use.
Here is the output from the two programs, each launched from a separate terminal but with the same working directory:
```
% ./fifoWriter
768000 ints sent to the pipe.
###
% ./fifoReader
Received ints: 768000, primes: 37682
```
### Message queues
Pipes have strict FIFO behavior: the first byte written is the first byte read, the second byte written is the second byte read, and so forth. Message queues can behave in the same way but are flexible enough that byte chunks can be retrieved out of FIFO order.
As the name suggests, a message queue is a sequence of messages, each of which has two parts:
* The payload, which is an array of bytes ( **char** in C)
* A type, given as a positive integer value; types categorize messages for flexible retrieval
Consider the following depiction of a message queue, with each message labeled with an integer type:
```
+-+ +-+ +-+ +-+
sender--->|3|--->|2|--->|2|--->|1|--->receiver
+-+ +-+ +-+ +-+
```
Of the four messages shown, the one labeled 1 is at the front, i.e., closest to the receiver. Next come two messages with label 2, and finally, a message labeled 3 at the back. If strict FIFO behavior were in play, then the messages would be received in the order 1-2-2-3. However, the message queue allows other retrieval orders. For example, the messages could be retrieved by the receiver in the order 3-2-1-2.
The _mqueue_ example consists of two programs, the _sender_ that writes to the message queue and the _receiver_ that reads from this queue. Both programs include the header file _queue.h_ shown below:
#### Example 4. The header file _queue.h_
```
#define ProjectId 123
#define PathName "queue.h" /* any existing, accessible file would do */
#define MsgLen 4
#define MsgCount 6
typedef struct {
long type; /* must be of type long */
char payload[MsgLen + 1]; /* bytes in the message */
} queuedMessage;
```
The header file defines a structure type named **queuedMessage** , with **payload** (byte array) and **type** (integer) fields. This file also defines symbolic constants (the **#define** statements), the first two of which are used to generate a key that, in turn, is used to get a message queue ID. The **ProjectId** can be any positive integer value, and the **PathName** must be an existing, accessible file—in this case, the file _queue.h_. The setup statements in both the _sender_ and the _receiver_ programs are:
```
key_t key = ftok(PathName, ProjectId); /* generate key */
int qid = msgget(key, 0666 | IPC_CREAT); /* use key to get queue id */
```
The ID **qid** is, in effect, the counterpart of a file descriptor for message queues.
#### Example 5. The message _sender_ program
```
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <stdlib.h>
#include <string.h>
#include "queue.h"
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /* EXIT_FAILURE */
}
int main() {
key_t key = ftok(PathName, ProjectId);
if (key < 0) report_and_exit("couldn't get key...");
int qid = msgget(key, 0666 | IPC_CREAT);
if (qid < 0) report_and_exit("couldn't get queue id...");
char* payloads[] = {"msg1", "msg2", "msg3", "msg4", "msg5", "msg6"};
int types[] = {1, 1, 2, 2, 3, 3}; /* each must be > 0 */
int i;
for (i = 0; i < MsgCount; i++) {
/* build the message */
queuedMessage msg;
msg.type = types[i];
[strcpy][11](msg.payload, payloads[i]);
/* send the message */
msgsnd(qid, &msg, sizeof(msg), IPC_NOWAIT); /* don't block */
[printf][10]("%s sent as type %i\n", msg.payload, (int) msg.type);
}
return 0;
}
```
The _sender_ program above sends out six messages, two each of a specified type: the first messages are of type 1, the next two of type 2, and the last two of type 3. The sending statement:
```
`msgsnd(qid, &msg, sizeof(msg), IPC_NOWAIT);`
```
is configured to be non-blocking (the flag **IPC_NOWAIT** ) because the messages are so small. The only danger is that a full queue, unlikely in this example, would result in a sending failure. The _receiver_ program below also receives messages using the **IPC_NOWAIT** flag.
#### Example 6. The message _receiver_ program
```
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <stdlib.h>
#include "queue.h"
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /* EXIT_FAILURE */
}
int main() {
key_t key= ftok(PathName, ProjectId); /* key to identify the queue */
if (key < 0) report_and_exit("key not gotten...");
int qid = msgget(key, 0666 | IPC_CREAT); /* access if created already */
if (qid < 0) report_and_exit("no access to queue...");
int types[] = {3, 1, 2, 1, 3, 2}; /* different than in sender */
int i;
for (i = 0; i < MsgCount; i++) {
queuedMessage msg; /* defined in queue.h */
if (msgrcv(qid, &msg, sizeof(msg), types[i], MSG_NOERROR | IPC_NOWAIT) < 0)
[puts][12]("msgrcv trouble...");
[printf][10]("%s received as type %i\n", msg.payload, (int) msg.type);
}
/** remove the queue **/
if (msgctl(qid, IPC_RMID, NULL) < 0) /* NULL = 'no flags' */
report_and_exit("trouble removing queue...");
return 0;
}
```
The _receiver_ program does not create the message queue, although the API suggests as much. In the _receiver_ , the call:
```
`int qid = msgget(key, 0666 | IPC_CREAT);`
```
is misleading because of the **IPC_CREAT** flag, but this flag really means _create if needed, otherwise access_. The _sender_ program calls **msgsnd** to send messages, whereas the _receiver_ calls **msgrcv** to retrieve them. In this example, the _sender_ sends the messages in the order 1-1-2-2-3-3, but the _receiver_ then retrieves them in the order 3-1-2-1-3-2, showing that message queues are not bound to strict FIFO behavior:
```
% ./sender
msg1 sent as type 1
msg2 sent as type 1
msg3 sent as type 2
msg4 sent as type 2
msg5 sent as type 3
msg6 sent as type 3
% ./receiver
msg5 received as type 3
msg1 received as type 1
msg3 received as type 2
msg2 received as type 1
msg6 received as type 3
msg4 received as type 2
```
The output above shows that the _sender_ and the _receiver_ can be launched from the same terminal. The output also shows that the message queue persists even after the _sender_ process creates the queue, writes to it, and exits. The queue goes away only after the _receiver_ process explicitly removes it with the call to **msgctl** :
```
`if (msgctl(qid, IPC_RMID, NULL) < 0) /* remove queue */`
```
### Wrapping up
The pipes and message queue APIs are fundamentally _unidirectional_ : one process writes and another reads. There are implementations of bidirectional named pipes, but my two cents is that this IPC mechanism is at its best when it is simplest. As noted earlier, message queues have fallen in popularity—but without good reason; these queues are yet another tool in the IPC toolbox. Part 3 completes this quick tour of the IPC toolbox with code examples of IPC through sockets and signals.
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/4/interprocess-communication-linux-channels
作者:[Marty Kalin][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/mkalindepauledu
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/talk_chat_communication_team.png?itok=CYfZ_gE7 (Chat bubbles)
[2]: https://en.wikipedia.org/wiki/Inter-process_communication
[3]: https://opensource.com/article/19/4/interprocess-communication-ipc-linux-part-1
[4]: http://man7.org/linux/man-pages/man2/mq_open.2.html
[5]: http://man7.org/linux/man-pages/man2/mq_open.2.html#ATTRIBUTES
[6]: http://www.opengroup.org/onlinepubs/009695399/functions/perror.html
[7]: http://www.opengroup.org/onlinepubs/009695399/functions/exit.html
[8]: http://www.opengroup.org/onlinepubs/009695399/functions/strlen.html
[9]: http://www.opengroup.org/onlinepubs/009695399/functions/rand.html
[10]: http://www.opengroup.org/onlinepubs/009695399/functions/printf.html
[11]: http://www.opengroup.org/onlinepubs/009695399/functions/strcpy.html
[12]: http://www.opengroup.org/onlinepubs/009695399/functions/puts.html

View File

@ -1,294 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (MjSeven)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Building scalable social media sentiment analysis services in Python)
[#]: via: (https://opensource.com/article/19/4/social-media-sentiment-analysis-python-scalable)
[#]: author: (Michael McCune https://opensource.com/users/elmiko/users/jschlessman)
Building scalable social media sentiment analysis services in Python
======
Learn how you can use spaCy, vaderSentiment, Flask, and Python to add
sentiment analysis capabilities to your work.
![Tall building with windows][1]
The [first part][2] of this series provided some background on how sentiment analysis works. Now let's investigate how to add these capabilities to your designs.
### Exploring spaCy and vaderSentiment in Python
#### Prerequisites
* A terminal shell
* Python language binaries (version 3.4+) in your shell
* The **pip** command for installing Python packages
* (optional) A [Python Virtualenv][3] to keep your work isolated from the system
#### Configure your environment
Before you begin writing code, you will need to set up the Python environment by installing the [spaCy][4] and [vaderSentiment][5] packages and downloading a language model to assist your analysis. Thankfully, most of this is relatively easy to do from the command line.
In your shell, type the following command to install the spaCy and vaderSentiment packages:
```
`pip install spacy vaderSentiment`
```
After the command completes, install a language model that spaCy can use for text analysis. The following command will use the spaCy module to download and install the English language [model][6]:
```
`python -m spacy download en_core_web_sm`
```
With these libraries and models installed, you are now ready to begin coding.
#### Do a simple text analysis
Use the [Python interpreter interactive mode][7] to write some code that will analyze a single text fragment. Begin by starting the Python environment:
```
$ python
Python 3.6.8 (default, Jan 31 2019, 09:38:34)
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
```
_(Your Python interpreter version print might look different than this.)_
1. Import the necessary modules: [code] >>> import spacy
>>> from vaderSentiment import vaderSentiment
```
2. Load the English language model from spaCy: [code]`>>> english = spacy.load("en_core_web_sm")`
```
3. Process a piece of text. This example shows a very simple sentence that we expect to return a slightly positive sentiment: [code]`>>> result = english("I like to eat applesauce with sugar and cinnamon.")`
```
4. Gather the sentences from the processed result. SpaCy has identified and processed the entities within the phrase; this step generates sentiment for each sentence (even though there is only one sentence in this example): [code]`>>> sentences = [str(s) for s in result.sents]`
```
5. Create an analyzer using vaderSentiments: [code]`>>> analyzer = vaderSentiment.SentimentIntensityAnalyzer()`
```
6. Perform the sentiment analysis on the sentences: [code]`>>> sentiment = [analyzer.polarity_scores(str(s)) for s in sentences]`
```
The sentiment variable now contains the polarity scores for the example sentence. Print out the value to see how it analyzed the sentence.
```
>>> print(sentiment)
[{'neg': 0.0, 'neu': 0.737, 'pos': 0.263, 'compound': 0.3612}]
```
What does this structure mean?
On the surface, this is an array with a single dictionary object; had there been multiple sentences, there would be a dictionary for each one. There are four keys in the dictionary that correspond to different types of sentiment. The **neg** key represents negative sentiment, of which none has been reported in this text, as evidenced by the **0.0** value. The **neu** key represents neutral sentiment, which has gotten a fairly high score of **0.737** (with a maximum of **1.0** ). The **pos** key represents positive sentiments, which has a moderate score of **0.263**. Last, the **compound** key represents an overall score for the text; this can range from negative to positive scores, with the value **0.3612** representing a sentiment more on the positive side.
To see how these values might change, you can run a small experiment using the code you already entered. The following block demonstrates an evaluation of sentiment scores on a similar sentence.
```
>>> result = english("I love applesauce!")
>>> sentences = [str(s) for s in result.sents]
>>> sentiment = [analyzer.polarity_scores(str(s)) for s in sentences]
>>> print(sentiment)
[{'neg': 0.0, 'neu': 0.182, 'pos': 0.818, 'compound': 0.6696}]
```
You can see that by changing the example sentence to something overwhelmingly positive, the sentiment values have changed dramatically.
### Building a sentiment analysis service
Now that you have assembled the basic building blocks for doing sentiment analysis, let's turn that knowledge into a simple service.
For this demonstration, you will create a [RESTful][8] HTTP server using the Python [Flask package][9]. This service will accept text data in English and return the sentiment analysis. Please note that this example service is for learning the technologies involved and not something to put into production.
#### Prerequisites
* A terminal shell
* The Python language binaries (version 3.4+) in your shell.
* The **pip** command for installing Python packages
* The **curl** command
* A text editor
* (optional) A [Python Virtualenv][3] to keep your work isolated from the system
#### Configure your environment
This environment is nearly identical to the one in the previous section. The only difference is the addition of the Flask package to Python.
1. Install the necessary dependencies: [code]`pip install spacy vaderSentiment flask`
```
2. Install the English language model for spaCy: [code]`python -m spacy download en_core_web_sm`
```
#### Create the application file
Open your editor and create a file named **app.py**. Add the following contents to it _(don't worry, we will review every line)_ :
```
import flask
import spacy
import vaderSentiment.vaderSentiment as vader
app = flask.Flask(__name__)
analyzer = vader.SentimentIntensityAnalyzer()
english = spacy.load("en_core_web_sm")
def get_sentiments(text):
result = english(text)
sentences = [str(sent) for sent in result.sents]
sentiments = [analyzer.polarity_scores(str(s)) for s in sentences]
return sentiments
@app.route("/", methods=["POST", "GET"])
def index():
if flask.request.method == "GET":
return "To access this service send a POST request to this URL with" \
" the text you want analyzed in the body."
body = flask.request.data.decode("utf-8")
sentiments = get_sentiments(body)
return flask.json.dumps(sentiments)
```
Although this is not an overly large source file, it is quite dense. Let's walk through the pieces of this application and describe what they are doing.
```
import flask
import spacy
import vaderSentiment.vaderSentiment as vader
```
The first three lines bring in the packages needed for performing the language analysis and the HTTP framework.
```
app = flask.Flask(__name__)
analyzer = vader.SentimentIntensityAnalyzer()
english = spacy.load("en_core_web_sm")
```
The next three lines create a few global variables. The first variable, **app** , is the main entry point that Flask uses for creating HTTP routes. The second variable, **analyzer** , is the same type used in the previous example, and it will be used to generate the sentiment scores. The last variable, **english** , is also the same type used in the previous example, and it will be used to annotate and tokenize the initial text input.
You might be wondering why these variables have been declared globally. In the case of the **app** variable, this is standard procedure for many Flask applications. But, in the case of the **analyzer** and **english** variables, the decision to make them global is based on the load times associated with the classes involved. Although the load time might appear minor, when it's run in the context of an HTTP server, these delays can negatively impact performance.
```
def get_sentiments(text):
result = english(text)
sentences = [str(sent) for sent in result.sents]
sentiments = [analyzer.polarity_scores(str(s)) for s in sentences]
return sentiments
```
The next piece is the heart of the service—a function for generating sentiment values from a string of text. You can see that the operations in this function correspond to the commands you ran in the Python interpreter earlier. Here they're wrapped in a function definition with the source **text** being passed in as the variable text and finally the **sentiments** variable returned to the caller.
```
@app.route("/", methods=["POST", "GET"])
def index():
if flask.request.method == "GET":
return "To access this service send a POST request to this URL with" \
" the text you want analyzed in the body."
body = flask.request.data.decode("utf-8")
sentiments = get_sentiments(body)
return flask.json.dumps(sentiments)
```
The last function in the source file contains the logic that will instruct Flask how to configure the HTTP server for the service. It starts with a line that will associate an HTTP route **/** with the request methods **POST** and **GET**.
After the function definition line, the **if** clause will detect if the request method is **GET**. If a user sends this request to the service, the following line will return a text message instructing how to access the server. This is largely included as a convenience to end users.
The next line uses the **flask.request** object to acquire the body of the request, which should contain the text string to be processed. The **decode** function will convert the array of bytes into a usable, formatted string. The decoded text message is now passed to the **get_sentiments** function to generate the sentiment scores. Last, the scores are returned to the user through the HTTP framework.
You should now save the file, if you have not done so already, and return to the shell.
#### Run the sentiment service
With everything in place, running the service is quite simple with Flask's built-in debugging server. To start the service, enter the following command from the same directory as your source file:
```
`FLASK_APP=app.py flask run`
```
You will now see some output from the server in your shell, and the server will be running. To test that the server is running, you will need to open a second shell and use the **curl** command.
First, check to see that the instruction message is printed by entering this command:
```
`curl http://localhost:5000`
```
You should see the instruction message:
```
`To access this service send a POST request to this URI with the text you want analyzed in the body.`
```
Next, send a test message to see the sentiment analysis by running the following command:
```
`curl http://localhost:5000 --header "Content-Type: application/json" --data "I love applesauce!"`
```
The response you get from the server should be similar to the following:
```
`[{"compound": 0.6696, "neg": 0.0, "neu": 0.182, "pos": 0.818}]`
```
Congratulations! You have now implemented a RESTful HTTP sentiment analysis service. You can find a link to a [reference implementation of this service and all the code from this article on GitHub][10].
### Continue exploring
Now that you have an understanding of the principles and mechanics behind natural language processing and sentiment analysis, here are some ways to further your discovery of this topic.
#### Create a streaming sentiment analyzer on OpenShift
While creating local applications to explore sentiment analysis is a convenient first step, having the ability to deploy your applications for wider usage is a powerful next step. By following the instructions and code in this [workshop from Radanalytics.io][11], you will learn how to create a sentiment analyzer that can be containerized and deployed to a Kubernetes platform. You will also see how Apache Kafka is used as a framework for event-driven messaging and how Apache Spark can be used as a distributed computing platform for sentiment analysis.
#### Discover live data with the Twitter API
Although the [Radanalytics.io][12] lab generated synthetic tweets to stream, you are not limited to synthetic data. In fact, anyone with a Twitter account can access the Twitter streaming API and perform sentiment analysis on tweets with the [Tweepy Python][13] package.
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/4/social-media-sentiment-analysis-python-scalable
作者:[Michael McCune ][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/elmiko/users/jschlessman
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/windows_building_sky_scale.jpg?itok=mH6CAX29 (Tall building with windows)
[2]: https://opensource.com/article/19/4/social-media-sentiment-analysis-python-part-1
[3]: https://virtualenv.pypa.io/en/stable/
[4]: https://pypi.org/project/spacy/
[5]: https://pypi.org/project/vaderSentiment/
[6]: https://spacy.io/models
[7]: https://docs.python.org/3.6/tutorial/interpreter.html
[8]: https://en.wikipedia.org/wiki/Representational_state_transfer
[9]: http://flask.pocoo.org/
[10]: https://github.com/elmiko/social-moments-service
[11]: https://github.com/radanalyticsio/streaming-lab
[12]: http://Radanalytics.io
[13]: https://github.com/tweepy/tweepy

View File

@ -1,5 +1,5 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: translator: (MjSeven)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )

View File

@ -0,0 +1,574 @@
[#]: collector: (lujun9972)
[#]: translator: (FSSlc)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Inter-process communication in Linux: Using pipes and message queues)
[#]: via: (https://opensource.com/article/19/4/interprocess-communication-linux-channels)
[#]: author: (Marty Kalin https://opensource.com/users/mkalindepauledu)
Linux 下的进程间通信:使用管道和消息队列
======
学习在 Linux 中进程是如何与其他进程进行同步的。
![Chat bubbles][1]
本篇是 Linux 下[进程间通信][2]IPC系列的第二篇文章。[第一篇文章][3] 聚焦于通过共享文件和共享内存段这样的共享存储来进行 IPC。这篇文件的重点将转向管道它是连接需要通信的进程之间的通道。管道拥有一个 _写端_ 用于写入字节数据,还有一个 _读端_ 用于按照先入先出的顺序读入这些字节数据。而这些字节数据可能代表任何东西:数字、数字电影等等。
管道有两种类型,命名管道和无名管道,都可以交互式的在命令行或程序中使用它们;相关的例子在下面展示。这篇文章也将介绍内存队列,尽管它们有些过时了,但它们不应该受这样的待遇。
在本系列的第一篇文章中的示例代码承认了在 IPC 中可能受到竞争条件(不管是基于文件的还是基于内存的)的威胁。自然地我们也会考虑基于管道的 IPC 的安全并发问题,这个也将在本文中提及。针对管道和内存队列的例子将会使用 POSIX 推荐使用的 API POSIX 的一个核心目标就是线程安全。
考虑查看 [**mq_open** 函数的 man 页][4],这个函数属于内存队列的 API。这个 man 页中有一个章节是有关 [特性][5] 的小表格:
Interface | Attribute | Value
---|---|---
mq_open() | Thread safety | MT-Safe
上面的 **MT-Safe****MT** 指的是 multi-threaded多线程意味着 **mq_open** 函数是线程安全的,进而暗示是进程安全的:一个进程的执行和它的一个线程执行的过程类似,假如竞争条件不会发生在处于 _相同_ 进程的线程中,那么这样的条件也不会发生在处于不同进程的线程中。**MT-Safe** 特性保证了调用 **mq_open** 时不会出现竞争条件。一般来说,基于通道的 IPC 是并发安全的,尽管在下面例子中会出现一个有关警告的注意事项。
### 无名管道
首先让我们通过一个特意构造的命令行例子来展示无名管道是如何工作的。在所有的现代系统中,符号 **|** 在命令行中都代表一个无名管道。假设我们的命令行提示符为 **%**,接下来考虑下面的命令:
```shell
% sleep 5 | echo "Hello, world!" ## writer to the left of |, reader to the right
```
_sleep_ 和 _echo_ 程序以不同的进程执行,无名管道允许它们进行通信。但是上面的例子被特意设计为没有通信发生。问候语 _Hello, world!_ 出现在屏幕中,然后过了 5 秒后,命令行返回,暗示 _sleep__echo_ 进程都已经结束了。这期间发生了什么呢?
在命令行中的竖线 **|** 的语法中左边的进程_sleep_是写入方右边的进程_echo_为读取方。默认情况下读入方将会堵塞直到从通道中能够读取字节数据而写入方在写完它的字节数据后将发送 流已终止 的标志。(即便写入方永久地停止了,一个流已终止的标志还是会发给读取方。)无名管道将保持到写入方和读取方都停止的那个时刻。
在上面的例子中_sleep_ 进程并没有向通道写入任何的字节数据,但在 5 秒后就停止了这时将向通道发送一个流已终止的标志。与此同时_echo_ 进程立即向标准输出(屏幕)写入问候语,因为这个进程并不从通道中读入任何字节,所以它并没有等待。一旦 _sleep__echo_ 进程都终止了,不会再用作通信的无名管道将会消失然后返回命令行提示符。
下面这个更加实用示例将使用两个无名管道。我们假定文件 _test.dat_ 的内容如下:
```
this
is
the
way
the
world
ends
```
下面的命令
```
% cat test.dat | sort | uniq
```
会将 _cat_concatenate 的缩写)进程的输出通过管道传给 _sort_ 进程以生成排序后的输出,然后将排序后的输出通过管道传给 _uniq_ 进程以消除重复的记录(在本例中,会将两次出现的 **the** 缩减为一个):
```
ends
is
the
this
way
world
```
下面展示的情景展示的是一个带有两个进程的程序通过一个无名管道通信来进行通信。
#### 示例 1. 两个进程通过一个无名管道来进行通信
```c
#include <sys/wait.h> /* wait */
#include <stdio.h>
#include <stdlib.h> /* exit functions */
#include <unistd.h> /* read, write, pipe, _exit */
#include <string.h>
#define ReadEnd 0
#define WriteEnd 1
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /** failure **/
}
int main() {
int pipeFDs[2]; /* two file descriptors */
char buf; /* 1-byte buffer */
const char* msg = "Nature's first green is gold\n"; /* bytes to write */
if (pipe(pipeFDs) < 0) report_and_exit("pipeFD");
pid_t cpid = fork(); /* fork a child process */
if (cpid < 0) report_and_exit("fork"); /* check for failure */
if (0 == cpid) { /*** child ***/ /* child process */
close(pipeFDs[WriteEnd]); /* child reads, doesn't write */
while (read(pipeFDs[ReadEnd], &buf, 1) > 0) /* read until end of byte stream */
write(STDOUT_FILENO, &buf, sizeof(buf)); /* echo to the standard output */
close(pipeFDs[ReadEnd]); /* close the ReadEnd: all done */
_exit(0); /* exit and notify parent at once */
}
else { /*** parent ***/
close(pipeFDs[ReadEnd]); /* parent writes, doesn't read */
write(pipeFDs[WriteEnd], msg, [strlen][8](msg)); /* write the bytes to the pipe */
close(pipeFDs[WriteEnd]); /* done writing: generate eof */
wait(NULL); /* wait for child to exit */
[exit][7](0); /* exit normally */
}
return 0;
}
```
上面名为 _pipeUN_ 的程序使用系统函数 **fork** 来创建一个进程。尽管这个程序只有一个单一的源文件,在它正确执行的情况下将会发生多进程的情况。下面的内容是对库函数 **fork** 如何工作的一个简要回顾:
* **fork** 函数由 _父_ 进程调用,在失败时返回 **-1** 给父进程。在 _pipeUN_ 这个例子中,相应的调用是
```c
pid_t cpid = fork(); /* called in parent */
```
函数,调用后的返回值也被保存下来了。在这个例子中,保存在整数类型 **pid_t** 的变量 **cpid** 中。(每个进程有它自己的 _进程 ID_,一个非负的整数,用来标记进程)。复刻一个新的进程可能会因为多种原因而失败。最终它们将会被包括进一个完整的 _进程表_,这个结构由系统维持,以此来追踪进程状态。明确地说,僵尸进程假如没有被处理掉,将可能引起一个进程表被填满。
* 假如 **fork** 调用成功,则它将创建一个新的子进程,向父进程返回一个值,向子进程返回另外的一个值。在调用 **fork** 后父进程和子进程都将执行相同的代码。(子进程继承了到此为止父进程中声明的所有变量的拷贝),特别地,一次成功的 **fork** 调用将返回如下的东西:
* 向子进程返回 0
* 向父进程返回子进程的进程 ID
* 在依次成功的 **fork** 调用后,一个 _if/else_ 或等价的结构将会被用来隔离针对父进程和子进程的代码。在这个例子中,相应的声明为:
```c
if (0 == cpid) { /*** child ***/
...
}
else { /*** parent ***/
...
}
```
假如成功地复刻出了一个子进程_pipeUN_ 程序将像下面这样去执行。存在一个整数的数列
```c
int pipeFDs[2]; /* two file descriptors */
```
来保存两个文件描述符,一个用来向管道中写入,另一个从管道中写入。(数组元素 **pipeFDs[0]** 是读端的文件描述符,元素 **pipeFDs[1]** 是写端的文件描述符。)在调用 **fork** 之前,对系统 **pipe** 函数的成功调用,将立刻使得这个数组获得两个文件描述符:
```c
if (pipe(pipeFDs) < 0) report_and_exit("pipeFD");
```
父进程和子进程现在都有了文件描述符的副本。但 _分离关注点_ 模式意味着每个进程恰好只需要一个描述符。在这个例子中,父进程负责写入,而子进程负责读取,尽管这样的角色分配可以反过来。在 _if_ 子句中的第一个语句将用于关闭管道的读端:
```c
close(pipeFDs[WriteEnd]); /* called in child code */
```
在父进程中的 _else_ 子句将会关闭管道的读端:
```c
close(pipeFDs[ReadEnd]); /* called in parent code */
```
然后父进程将向无名管道中写入某些字节数据ASCII 代码),子进程读取这些数据,然后向标准输出中回放它们。
在这个程序中还需要澄清的一点是在父进程代码中的 **wait** 函数。一旦被创建后,子进程很大程度上独立于它的父进程,正如简短的 _pipeUN_ 程序所展示的那样。子进程可以执行任意的代码,而它们可能与父进程完全没有关系。但是,假如当子进程终止时,系统将会通过一个信号来通知父进程。
要是父进程在子进程之前终止又该如何呢?在这种情形下,除非采取了预防措施,子进程将会变成在进程表中的一个 _僵尸_ 进程。预防措施有两大类型:第一种是让父进程去通知系统,告诉系统它对子进程的终止没有任何兴趣:
```c
signal(SIGCHLD, SIG_IGN); /* in parent: ignore notification */
```
第二种方法是在子进程终止时,让父进程执行一个 **wait**。这样就确保了父进程可以独立于子进程而存在。在 _pipeUN_ 程序中使用了第二种方法,其中父进程的代码使用的是下面的调用:
```c
wait(NULL); /* called in parent */
```
这个对 **wait** 的调用意味着 _一直等待直到任意一个子进程的终止发生_,因此在 _pipeUN_ 程序中,只有一个子进程。(其中的 **NULL** 参数可以被替换为一个保存有子程序退出状态的整数变量的地址。)对于更细颗粒度的控制,还可以使用更灵活的 **waitpid** 函数,例如特别指定多个子进程中的某一个。
_pipeUN_ 将会采取另一个预防措施。当父进程结束了等待,父进程将会调用常规的 **exit** 函数去退出。对应的,子进程将会调用 **_exit** 变种来退出,这类变种将快速跟踪终止相关的通知。在效果上,子进程会告诉系统立刻去通知父进程它的这个子进程已经终止了。
假如两个进程向相同的无名管道中写入内容,字节数据会交错吗?例如,假如进程 P1 向管道写入内容:
```
foo bar
```
同时进程 P2 并发地写入:
```
baz baz
```
到相同的管道,最后的结果似乎是管道中的内容将会是任意错乱的,例如像这样:
```
baz foo baz bar
```
POSIX 标准确保了写不是交错的,使得没有写操作能够超过 **PIPE_BUF** 的范围。在 Linux 系统中, **PIPE_BUF** 的大小是 4096 字节。对于管道我更喜欢只有一个写方和一个读方,从而绕过这个问题。
## 命名管道
无名管道没有备份文件:系统将维持一个内存缓存来将字节数据从写方传给读方。一旦写方和读方终止,这个缓存将会被回收,进而无名管道消失。相反的,命名管道有备份文件和一个不同的 API。
下面让我们通过另一个命令行示例来知晓命名管道的要点。下面是具体的步骤:
* 开启两个终端。这两个终端的工作目录应该相同。
* 在其中一个终端中,键入下面的两个命令(命令行提示符仍然是 **%**,我的注释以 **##** 打头。):
```shell
% mkfifo tester ## creates a backing file named tester
% cat tester ## type the pipe's contents to stdout
```
在最开始,没有任何东西会出现在终端中,因为到现在为止没有命名管道中写入任何东西。
* 在第二个终端中输入下面的命令:
```shell
% cat > tester ## redirect keyboard input to the pipe
hello, world! ## then hit Return key
bye, bye ## ditto
<Control-C> ## terminate session with a Control-C
```
无论在这个终端中输入什么,它都会在另一个终端中显示出来。一旦键入 **Ctrl+C**,就会回到正常的命令行提示符,因为管道已经被关闭了。
* 通过移除实现命名管道的文件来进行清理:
```shell
% unlink tester
```
正如 _mkfifo_ 程序的名字所暗示的那样,一个命名管道也被叫做一个 FIFO因为第一个字节先进然后第一个字节就先出其他的类似。存在一个名为 **mkfifo** 的库函数,用它可以在程序中创建一个命名管道,它将在下一个示例中被用到,该示例由两个进程组成:一个向命名管道写入,而另一个从该管道读取。
#### 示例 2. _fifoWriter_ 程序
```c
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#define MaxLoops 12000 /* outer loop */
#define ChunkSize 16 /* how many written at a time */
#define IntsPerChunk 4 /* four 4-byte ints per chunk */
#define MaxZs 250 /* max microseconds to sleep */
int main() {
const char* pipeName = "./fifoChannel";
mkfifo(pipeName, 0666); /* read/write for user/group/others */
int fd = open(pipeName, O_CREAT | O_WRONLY); /* open as write-only */
if (fd < 0) return -1; /** error **/
int i;
for (i = 0; i < MaxLoops; i++) { /* write MaxWrites times */
int j;
for (j = 0; j < ChunkSize; j++) { /* each time, write ChunkSize bytes */
int k;
int chunk[IntsPerChunk];
for (k = 0; k < IntsPerChunk; k++)
chunk[k] = [rand][9]();
write(fd, chunk, sizeof(chunk));
}
usleep(([rand][9]() % MaxZs) + 1); /* pause a bit for realism */
}
close(fd); /* close pipe: generates an end-of-file */
unlink(pipeName); /* unlink from the implementing file */
[printf][10]("%i ints sent to the pipe.\n", MaxLoops * ChunkSize * IntsPerChunk);
return 0;
}
```
上面的 _fifoWriter_ 程序可以被总结为如下:
* 首先程序创建了一个命名管道用来写入数据:
```c
mkfifo(pipeName, 0666); /* read/write perms for user/group/others */
int fd = open(pipeName, O_CREAT | O_WRONLY);
```
其中的 **pipeName** 是传递给 **mkfifo** 作为它的第一个参数的备份文件的名字。接着命名管道通过我们熟悉的 **open** 函数调用被打开,而这个函数将会返回一个文件描述符。
* 在实现层面上_fifoWriter_ 不会一次性将所有的数据都写入,而是写入一个块,然后休息随机数目的微秒时间,接着再循环往复。总的来说,有 768000 个 4 比特的整数值被写入到命名管道中。
* 在关闭命名管道后_fifoWriter_ 也将使用 unlink 去掉关联。
```c
close(fd); /* close pipe: generates end-of-stream marker */
unlink(pipeName); /* unlink from the implementing file */
```
一旦连接到管道的每个进程都执行了 unlink 操作后,系统将回收这些备份文件。在这个例子中,只有两个这样的进程 _fifoWriter__fifoReader_,它们都做了 _unlink_ 操作。
这个两个程序应该在位于相同工作目录下的不同终端中被执行。但是 _fifoWriter_ 应该在 _fifoReader_ 之前被启动,因为需要 _fifoWriter_ 去创建管道。然后 _fifoReader_ 才能够获取到刚被创建的命名管道。
#### 示例 3. _fifoReader_ 程序
```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
unsigned is_prime(unsigned n) { /* not pretty, but gets the job done efficiently */
if (n <= 3) return n > 1;
if (0 == (n % 2) || 0 == (n % 3)) return 0;
unsigned i;
for (i = 5; (i * i) <= n; i += 6)
if (0 == (n % i) || 0 == (n % (i + 2))) return 0;
return 1; /* found a prime! */
}
int main() {
const char* file = "./fifoChannel";
int fd = open(file, O_RDONLY);
if (fd < 0) return -1; /* no point in continuing */
unsigned count = 0, total = 0, primes_count = 0;
while (1) {
int next;
int i;
ssize_t count = read(fd, &next, sizeof(int));
if (0 == count) break; /* end of stream */
else if (count == sizeof(int)) { /* read a 4-byte int value */
total++;
if (is_prime(next)) primes_count++;
}
}
close(fd); /* close pipe from read end */
unlink(file); /* unlink from the underlying file */
[printf][10]("Received ints: %u, primes: %u\n", total, primes_count);
return 0;
}
```
上面的 _fifoReader_ 的内容可以总结为如下:
* 因为 _fifoWriter_ 已经创建了命名管道,所以 _fifoReader_ 只需要利用标准的 **open** 调用来通过备份文件来获取到管道中的内容:
```c
const char* file = "./fifoChannel";
int fd = open(file, O_RDONLY);
```
这个文件的打开是只读的。
* 然后这个程序进入一个潜在的无限循环,在每次循环时,尝试读取 4 比特的块。**read** 调用:
```c
ssize_t count = read(fd, &next, sizeof(int));
```
返回 0 来暗示流的结束。在这种情况下_fifoReader_ 跳出循环,关闭命名管道,并在终止前 unlink 备份文件。
* 在读入 4 比特整数后_fifoReader_ 检查这个数是否为质数。这个操作代表了一个生产级别的读取器可能在接收到的字节数据上执行的逻辑操作。在示例运行中,接收了 768000 个整数中的 37682 个质数。
在重复的运行示例时, _fifoReader_ 将成功地读取 _fifoWriter_ 写入的所有字节。这不是很让人惊讶的。这两个进程在相同的机器上执行,从而可以不用考虑网络相关的问题。命名管道是一个可信且高效的 IPC 机制,因而被广泛使用。
下面是这两个程序的输出,在不同的终端中启动,但处于相同的工作目录:
```shell
% ./fifoWriter
768000 ints sent to the pipe.
###
% ./fifoReader
Received ints: 768000, primes: 37682
```
### 消息队列
管道有着严格的先入先出行为:第一个被写入的字节将会第一个被读,第二个写入的字节将第二个被读,以此类推。消息队列可以做出相同的表现,但它又足够灵活,可以使得字节块不以先入先出的次序来接收。
正如它的名字所建议的那样,消息队列是一系列的消息,每个消息包含两部分:
* 荷载,一个字节序列(在 C 中是 **char**
* 一个类型,以一个正整数值的形式给定,类型用来分类消息,为了更灵活的回收
考虑下面对一个消息队列的描述,每个消息被一个整数类型标记:
```
+-+ +-+ +-+ +-+
sender--->|3|--->|2|--->|2|--->|1|--->receiver
+-+ +-+ +-+ +-+
```
在上面展示的 4 个消息中,标记为 1 的是开头,即最接近接收端,然后另个标记为 2 的消息,最后接着一个标记为 3 的消息。假如按照严格的 FIFO 行为执行,消息将会以 1-2-2-3 这样的次序被接收。但是消息队列允许其他回收次序。例如,消息可以被接收方以 3-2-1-2 的次序接收。
_mqueue_ 示例包含两个程序_sender_ 将向消息队列中写入数据,而 _receiver_ 将从这个队列中读取数据。这两个程序都包含下面展示的头文件 _queue.h_
#### 示例 4. 头文件 _queue.h_
```c
#define ProjectId 123
#define PathName "queue.h" /* any existing, accessible file would do */
#define MsgLen 4
#define MsgCount 6
typedef struct {
long type; /* must be of type long */
char payload[MsgLen + 1]; /* bytes in the message */
} queuedMessage;
```
上面的头文件定义了一个名为 **queuedMessage** 的结构类型,它带有 **payload**(字节数组)和 **type**(整数)这两个域。该文件也定义了一些符号常数(使用 **#define** 语句)。前两个常数被用来生成一个 key而这个 key 反过来被用来获取一个消息队列的 ID。**ProjectId** 可以是任何正整数值,而 **PathName** 必须是一个存在的,可访问的文件,在这个示例中,指的是文件 _queue.h_。在 _sender__receiver_ 中,它们都有的设定语句为:
```c
key_t key = ftok(PathName, ProjectId); /* generate key */
int qid = msgget(key, 0666 | IPC_CREAT); /* use key to get queue id */
```
ID **qid** 在效果上是消息队列文件描述符的对应物。
#### 示例 5. _sender_ 程序
```c
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <stdlib.h>
#include <string.h>
#include "queue.h"
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /* EXIT_FAILURE */
}
int main() {
key_t key = ftok(PathName, ProjectId);
if (key < 0) report_and_exit("couldn't get key...");
int qid = msgget(key, 0666 | IPC_CREAT);
if (qid < 0) report_and_exit("couldn't get queue id...");
char* payloads[] = {"msg1", "msg2", "msg3", "msg4", "msg5", "msg6"};
int types[] = {1, 1, 2, 2, 3, 3}; /* each must be > 0 */
int i;
for (i = 0; i < MsgCount; i++) {
/* build the message */
queuedMessage msg;
msg.type = types[i];
[strcpy][11](msg.payload, payloads[i]);
/* send the message */
msgsnd(qid, &msg, sizeof(msg), IPC_NOWAIT); /* don't block */
[printf][10]("%s sent as type %i\n", msg.payload, (int) msg.type);
}
return 0;
}
```
上面的 _sender_ 程序将发送出 6 个消息,每两个为一个类型:前两个是类型 1接着的连个是类型 2最后的两个为类型 3。发送的语句
```c
msgsnd(qid, &msg, sizeof(msg), IPC_NOWAIT);
```
被配置为非阻塞的(**IPC_NOWAIT** 标志),因为这里的消息体量上都很小。唯一的危险在于一个完整的序列将可能导致发送失败,而这个例子不会。下面的 _receiver_ 程序也将使用 **IPC_NOWAIT** 标志来接收消息。
#### 示例 6. _receiver_ 程序
```c
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <stdlib.h>
#include "queue.h"
void report_and_exit(const char* msg) {
[perror][6](msg);
[exit][7](-1); /* EXIT_FAILURE */
}
int main() {
key_t key= ftok(PathName, ProjectId); /* key to identify the queue */
if (key < 0) report_and_exit("key not gotten...");
int qid = msgget(key, 0666 | IPC_CREAT); /* access if created already */
if (qid < 0) report_and_exit("no access to queue...");
int types[] = {3, 1, 2, 1, 3, 2}; /* different than in sender */
int i;
for (i = 0; i < MsgCount; i++) {
queuedMessage msg; /* defined in queue.h */
if (msgrcv(qid, &msg, sizeof(msg), types[i], MSG_NOERROR | IPC_NOWAIT) < 0)
[puts][12]("msgrcv trouble...");
[printf][10]("%s received as type %i\n", msg.payload, (int) msg.type);
}
/** remove the queue **/
if (msgctl(qid, IPC_RMID, NULL) < 0) /* NULL = 'no flags' */
report_and_exit("trouble removing queue...");
return 0;
}
```
这个 _receiver_ 程序不会创建消息队列,尽管 API 看起来像是那样。在 _receiver_ 中,对
```c
int qid = msgget(key, 0666 | IPC_CREAT);
```
的调用可能因为带有 **IPC_CREAT** 标志而具有误导性,但是这个标志的真实意义是 _如果需要就创建否则直接获取_。_sender_ 程序调用 **msgsnd** 来发送消息,而 _receiver_ 调用 **msgrcv** 来接收它们。在这个例子中_sender_ 以 1-1-2-2-3-3 的次序发送消息,但 _receiver_ 接收它们的次序为 3-1-2-1-3-2这显示消息队列没有被严格的 FIFO 行为所拘泥:
```shell
% ./sender
msg1 sent as type 1
msg2 sent as type 1
msg3 sent as type 2
msg4 sent as type 2
msg5 sent as type 3
msg6 sent as type 3
% ./receiver
msg5 received as type 3
msg1 received as type 1
msg3 received as type 2
msg2 received as type 1
msg6 received as type 3
msg4 received as type 2
```
上面的输出显示 _sender__receiver_ 可以在同一个终端中启动。输出也显示消息队列是持久的,即便在 _sender_ 进程在完成创建队列,向队列写数据,然后离开的整个过程后,队列仍然存在。只有在 _receiver_ 进程显式地调用 **msgctl** 来移除该队列,这个队列才会消失:
```c
if (msgctl(qid, IPC_RMID, NULL) < 0) /* remove queue */
```
### 总结
管道和消息队列的 API 在根本上来说都是单向的:一个进程写,然后另一个进程读。当然还存在双向命名管道的实现,但我认为这个 IPC 机制在它最为简单的时候反而是最佳的。正如前面提到的那样,消息队列已经不大受欢迎了,尽管没有找到什么特别好的原因来解释这个现象。而队列仍然是 IPC 工具箱中的另一个工具。这个快速的 IPC 工具箱之旅将以第 3 部分-通过套接字和信号来示例 IPC -来终结。
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/4/interprocess-communication-linux-channels
作者:[Marty Kalin][a]
选题:[lujun9972][b]
译者:[FSSlc](https://github.com/FSSlc)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/mkalindepauledu
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/talk_chat_communication_team.png?itok=CYfZ_gE7 (Chat bubbles)
[2]: https://en.wikipedia.org/wiki/Inter-process_communication
[3]: https://opensource.com/article/19/4/interprocess-communication-ipc-linux-part-1
[4]: http://man7.org/linux/man-pages/man2/mq_open.2.html
[5]: http://man7.org/linux/man-pages/man2/mq_open.2.html#ATTRIBUTES
[6]: http://www.opengroup.org/onlinepubs/009695399/functions/perror.html
[7]: http://www.opengroup.org/onlinepubs/009695399/functions/exit.html
[8]: http://www.opengroup.org/onlinepubs/009695399/functions/strlen.html
[9]: http://www.opengroup.org/onlinepubs/009695399/functions/rand.html
[10]: http://www.opengroup.org/onlinepubs/009695399/functions/printf.html
[11]: http://www.opengroup.org/onlinepubs/009695399/functions/strcpy.html
[12]: http://www.opengroup.org/onlinepubs/009695399/functions/puts.html

View File

@ -0,0 +1,290 @@
[#]: collector: (lujun9972)
[#]: translator: (MjSeven)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Building scalable social media sentiment analysis services in Python)
[#]: via: (https://opensource.com/article/19/4/social-media-sentiment-analysis-python-scalable)
[#]: author: (Michael McCune https://opensource.com/users/elmiko/users/jschlessman)
使用 Python 构建可扩展的社交媒体情感分析服务
======
学习如何使用 spaCy、vaderSentiment、Flask 和 Python 来为你的工作添加情感分析能力。
![Tall building with windows][1]
本系列的[第一部分][2]提供了情感分析工作原理的一些背景知识,现在让我们研究如何将这些功能添加到你的设计中。
### 探索 Python 库 spaCy 和 vaderSentiment
#### 前提条件
* 一个终端 shell
* shell 中的 Python 语言二进制文件3.4+ 版本)
* 用于安装 Python 包的 **pip** 命令
* (可选)一个 [Python 虚拟环境][3]使你的工作与系统隔离开来
#### 配置环境
在开始编写代码之前,你需要安装 [spaCy][4] 和 [vaderSentiment][5] 包来设置 Python 环境,同时下载一个语言模型来帮助你分析。幸运的是,大部分操作都容易在命令行中完成。
在 shell 中,输入以下命令来安装 spaCy 和 vaderSentiment 包:
```
pip install spacy vaderSentiment
```
命令安装完成后,安装 spaCy 可用于文本分析的语言模型。以下命令将使用 spaCy 模块下载并安装英语[模型][6]
```
python -m spacy download en_core_web_sm
```
安装了这些库和模型之后,就可以开始编码了。
#### 一个简单的文本分析
使用 [Python 解释器交互模式][7] 编写一些代码来分析单个文本片段。首先启动 Python 环境:
```
$ python
Python 3.6.8 (default, Jan 31 2019, 09:38:34)
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
```
_(你的 Python 解释器版本打印可能与此不同。)_
1. 导入所需模块:
```
>>> import spacy
>>> from vaderSentiment import vaderSentiment
```
2. 从 spaCy 加载英语语言模型:
```
>>> english = spacy.load("en_core_web_sm")
```
3. 处理一段文本。本例展示了一个非常简单的句子,我们希望它能给我们带来些许积极的情感:
```
>>> result = english("I like to eat applesauce with sugar and cinnamon.")
```
4. 从处理后的结果中收集句子。SpaCy 已识别并处理短语中的实体,这一步为每个句子生成情感(即时在本例中只有一个句子):
```
>>> sentences = [str(s) for s in result.sents]
```
5. 使用 vaderSentiments 创建一个分析器:
```
>>> analyzer = vaderSentiment.SentimentIntensityAnalyzer()
```
6. 对句子进行情感分析:
```
>>> sentiment = [analyzer.polarity_scores(str(s)) for s in sentences]
```
`sentiment` 变量现在包含例句的极性分数。打印出这个值,看看它是如何分析这个句子的。
```
>>> print(sentiment)
[{'neg': 0.0, 'neu': 0.737, 'pos': 0.263, 'compound': 0.3612}]
```
这个结构是什么意思?
表面上,这是一个只有一个字典对象的数组。如果有多个句子,那么每个句子都会对应一个字典对象。字典中有四个键对应不同类型的情感。**neg** 键表示负面情感,因为在本例中没有报告任何负面情感,**0.0** 值证明了这一点。**neu** 键表示中性情感,它的得分相当高,为**0.737**(最高为 **1.0**)。**pos** 键代表积极情感,得分适中,为 **0.263**。最后,**cmpound** 键代表文本的总体得分,它可以从负数到正数,**0.3612** 表示积极方面的情感多一点。
要查看这些值可能如何变化,你可以使用已输入的代码做一个小实验。以下代码块显示了如何对类似句子的情感评分的评估。
```
>>> result = english("I love applesauce!")
>>> sentences = [str(s) for s in result.sents]
>>> sentiment = [analyzer.polarity_scores(str(s)) for s in sentences]
>>> print(sentiment)
[{'neg': 0.0, 'neu': 0.182, 'pos': 0.818, 'compound': 0.6696}]
```
你可以看到,通过将例句改为非常积极的句子,`sentiment` 的值发生了巨大变化。
### 建立一个情感分析服务
现在你已经为情感分析组装了基本的代码块,让我们将这些东西转化为一个简单的服务。
在这个演示中,你将使用 Python [Flask 包][9] 创建一个 [RESTful][8] HTTP 服务器。此服务将接受英文文本数据并返回情感分析结果。请注意,此示例服务是用于学习所涉及的技术,而不是用于投入生产的东西。
#### 前提条件
* 一个终端 shell
* shell 中的 Python 语言二进制文件3.4+版本)
* 安装 Python 包的 **pip** 命令
* **curl** 命令
* 一个文本编辑器
* (可选) 一个 [Python 虚拟环境][3]使你的工作与系统隔离开来
#### 配置环境
这个环境几乎与上一节中的环境相同,唯一的区别是在 Python 环境中添加了 Flask 包。
1. 安装所需依赖项:
```
pip install spacy vaderSentiment flask
```
2. 安装 spaCy 的英语语言模型:
```
python -m spacy download en_core_web_sm
```
#### 创建应用程序文件
打开编辑器,创建一个名为 **app.py** 的文件。添加以下内容 _不用担心我们将解释每一行_ :
```
import flask
import spacy
import vaderSentiment.vaderSentiment as vader
app = flask.Flask(__name__)
analyzer = vader.SentimentIntensityAnalyzer()
english = spacy.load("en_core_web_sm")
def get_sentiments(text):
result = english(text)
sentences = [str(sent) for sent in result.sents]
sentiments = [analyzer.polarity_scores(str(s)) for s in sentences]
return sentiments
@app.route("/", methods=["POST", "GET"])
def index():
if flask.request.method == "GET":
return "To access this service send a POST request to this URL with" \
" the text you want analyzed in the body."
body = flask.request.data.decode("utf-8")
sentiments = get_sentiments(body)
return flask.json.dumps(sentiments)
```
虽然这个源文件不是很大,但它非常密集。让我们来看看这个应用程序的各个部分,并解释它们在做什么。
```
import flask
import spacy
import vaderSentiment.vaderSentiment as vader
```
前三行引入了执行语言分析和 HTTP 框架所需的包。
```
app = flask.Flask(__name__)
analyzer = vader.SentimentIntensityAnalyzer()
english = spacy.load("en_core_web_sm")
```
接下来的三行代码创建了一些全局变量。第一个变量 **app**,它是 Flask 用于创建 HTTP 路由的主要入口点。第二个变量 **analyzer** 与上一个示例中使用的类型相同,它将用于生成情感分数。最后一个变量 **english** 也与上一个示例中使用的类型相同,它将用于注释和标记初始文本输入。
你可能想知道为什么全局声明这些变量。对于 **app** 变量,这是许多 Flask 应用程序的标准过程。但是,对于 **analyzer****english** 变量,将它们设置为全局变量的决定是基于与所涉及的类关联的加载时间。虽然加载时间可能看起来很短,但是当它在 HTTP 服务器的上下文中运行时,这些延迟会对性能产生负面影响。
```
def get_sentiments(text):
result = english(text)
sentences = [str(sent) for sent in result.sents]
sentiments = [analyzer.polarity_scores(str(s)) for s in sentences]
return sentiments
```
这部分是服务的核心 -- 一个用于从一串文本生成情感值的函数。你可以看到此函数中的操作对应于你之前在 Python 解释器中运行的命令。这里它们被封装在一个函数定义中,**text** 源作为文本变量传入,最后 **sentiments** 变量返回给调用者。
```
@app.route("/", methods=["POST", "GET"])
def index():
if flask.request.method == "GET":
return "To access this service send a POST request to this URL with" \
" the text you want analyzed in the body."
body = flask.request.data.decode("utf-8")
sentiments = get_sentiments(body)
return flask.json.dumps(sentiments)
```
源文件的最后一个函数包含了指导 Flask 如何为服务配置 HTTP 服务器的逻辑。它从一行开始,该行将 HTTP 路由 **/** 与请求方法 **POST****GET** 相关联。
在函数定义行之后,**if** 子句将检测请求方法是否为 **GET**。如果用户向服务发送此请求,那么下面的行将返回一条指示如何访问服务器的文本消息。这主要是为了方便最终用户。
下一行使用 **flask.request** 对象来获取请求的主体,该主体应包含要处理的文本字符串。**decode** 函数将字节数组转换为可用的格式化字符串。经过解码的文本消息被传递给 **get_sentiments** 函数以生成情感分数。最后,分数通过 HTTP 框架返回给用户。
你现在应该保存文件,如果尚未保存,那么返回 shell。
#### 运行情感服务
一切就绪后,使用 Flask 的内置调试服务器运行服务非常简单。要启动该服务,请从与源文件相同的目录中输入以下命令:
```
FLASK_APP=app.py flask run
```
现在,你将在 shell 中看到来自服务器的一些输出,并且服务器将处于运行状态。要测试服务器是否正在运行,你需要打开第二个 shell 并使用 **curl** 命令。
首先,输入以下命令检查是否打印了指令信息:
```
curl http://localhost:5000
```
你应该看到说明消息:
```
To access this service send a POST request to this URI with the text you want analyzed in the body.
```
接下来,运行以下命令发送测试消息,查看情感分析:
```
curl http://localhost:5000 --header "Content-Type: application/json" --data "I love applesauce!"
```
你从服务器获得的响应应类似于以下内容:
```
[{"compound": 0.6696, "neg": 0.0, "neu": 0.182, "pos": 0.818}]
```
恭喜!你现在已经实现了一个 RESTful HTTP 情感分析服务。你可以在 [GitHub 上找到此服务的参考实现和本文中的所有代码][10]。
### 继续探索
现在你已经了解了自然语言处理和情感分析背后的原理和机制,下面是进一步发现探索主题的一些方法。
#### 在 OpenShift 上创建流式情感分析器
虽然创建本地应用程序来研究情绪分析很方便,但是接下来需要能够部署应用程序以实现更广泛的用途。按照[ Radnaalytics.io][11] 提供的指导和代码进行操作,你将学习如何创建一个情感分析仪,可以集装箱化并部署到 Kubernetes 平台。你还将了解如何将 APache Kafka 用作事件驱动消息传递的框架,以及如何将 Apache Spark 用作情绪分析的分布式计算平台。
#### 使用 Twitter API 发现实时数据
虽然 [Radanalytics.io][12] 实验室可以生成合成推文流,但你可以不受限于合成数据。事实上,拥有 Twitter 账户的任何人都可以使用 [Tweepy Python][13] 包访问 Twitter 流媒体 API 对推文进行情感分析。
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/4/social-media-sentiment-analysis-python-scalable
作者:[Michael McCune ][a]
选题:[lujun9972][b]
译者:[MjSeven](https://github.com/MjSeven)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/elmiko/users/jschlessman
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/windows_building_sky_scale.jpg?itok=mH6CAX29 (Tall building with windows)
[2]: https://opensource.com/article/19/4/social-media-sentiment-analysis-python-part-1
[3]: https://virtualenv.pypa.io/en/stable/
[4]: https://pypi.org/project/spacy/
[5]: https://pypi.org/project/vaderSentiment/
[6]: https://spacy.io/models
[7]: https://docs.python.org/3.6/tutorial/interpreter.html
[8]: https://en.wikipedia.org/wiki/Representational_state_transfer
[9]: http://flask.pocoo.org/
[10]: https://github.com/elmiko/social-moments-service
[11]: https://github.com/radanalyticsio/streaming-lab
[12]: http://Radanalytics.io
[13]: https://github.com/tweepy/tweepy