Archive for January, 2015

One line awk to find all words with all vowels

Here is a one line awk command to find all the words which contains all the vowels in a given file.

awk '{c=split($0, s); for(n=1; n<=c; ++n) print s[n] }' $1 | awk 'match ($0, /[e]/)' |awk 'match ($0, /[i]/)' | awk 'match ($0, /[a]/)'| awk 'match ($0, /[o]/)' |awk 'match ($0, /[u]/)'

if you have a file with the following contents


The above command prints

In the above command we are using awk to extract the word and finding whether the given word contains a, e, i, o, u. If any word which contains all the vowels gets printed in the output.

Here is the command that checks all the .txt files recursively and checks for all the words with a, e, i, o, u.

find -name "*.txt" |xargs awk '{c=split($0, s); for(n=1; n<=c; ++n) print s[n] }'| awk 'match ($0, /[e]/)' |awk 'match ($0, /[i]/)' | awk 'match ($0, /[a]/)'| awk 'match ($0, /[o]/)' |awk 'match ($0, /[u]/)'|sort|uniq -i|awk '!match ($0, /[_,\-\.\/\|\@\:\]\[\)\(\*\&\=\*\#\!\{\}\"\?]/)'

A quick observation from the above script is that the last regular expression (awk ‘!match ($0, /[_,\-\.\/\|\@\:\]\[\)\(\*\&\=\*\#\!\{\}\”\?]/)’) is to remove all the words which contain the special characters. But if could eliminate these characters much before they even come about we could significantly improve the performance.

find -name "*.txt" |xargs awk '{c=split($0, s); for(n=1; n<=c; ++n) print s[n] }'| awk '!match ($0, /[_,\-\.\/\|\@\:\]\[\)\(\*\&\=\*\#\!\{\}\"\?]/)'| awk 'match ($0, /[e]/)' |awk 'match ($0, /[i]/)' | awk 'match ($0, /[a]/)'| awk 'match ($0, /[o]/)' |awk 'match ($0, /[u]/)'|sort|uniq -i

Ran the above two commands with “time” command and results are as follows

Without optimization:

real 0m0.824s
user 0m1.508s
sys 0m0.076s

With optimization the last command

real 0m0.784s
user 0m1.868s
sys 0m0.064s

On a medium sized input directory itself we could clearly find the performance improvement.


Program to convert one endian ness to another

Here is the program to convert from given endian ness to other endian ness. If the given endian ness is little it converts to big endian and vice versa.
This code has been written keeping in mind sizeof(int) = 4 Bytes.

#include <stdio.h>
int main(void)
    int i = 0x12345678;
    printf("Before 0x%x\n", i);
    int p = 0;
    p |= ((0xff & i) << 24);
    p |= ((((0xff << 8) & i) >> 8) << 16);
    p |= ((((0xff << 16) & i) >> 16) << 8);
    p |= ((((0xff << 24) & i) >> 24));
    printf("After 0x%x\n", p);
    return 0;

Thanks for the comments. As per the suggestions adding some more explanation.

Here the following code tells you whether you have LSB representation or MSB representation

#include <stdio.h>
int main(void)
    int i = 0x12345678;
    char *r;
    r = &i;
    printf("Internal storage representation: %x%x%x%x\n", r[0], r[1], r[2],  r[3]);
    return 0;

After you run the above code if you get the prints as 78563412 then you have LSB based representation. Else this code prints 12345678 then you have MSB based representation.

So, if you quickly see the above two representations for i = 0x12345678; For any byte sequence p1 p2 p3 p4 the other endian representation is p4 p3 p2 p1.
Hence for conversion for a 4 byte integers we need to take the last byte and place it in the appropriate 31 to 24 bit position.

ie., p |= ((0xff & i) << 24);

Now we need to take the bits from 9 to 16 (ie., second byte) and place them in 23 to 16 positions.

ie., p |= ((((0xff <> 8) << 16);

same applies for the next two bytes as well.

Thanks for reading the article.