Preparing for 4.2 release

khearn · August 1, 2019, 17:41

Originally posted by Derakon

It wouldn't surprise me if modern compilers are capable of saying "Ah, you're invoking this simple function here, we can just inline it" and get the same performance as a macro. In any case, the performance hit of invoking a function is miniscule in this day and age.

Back in the early 90's, I was working in a group doing compiler optimization, and function inlining was definitely a thing way back then.

Pete Mack · August 1, 2019, 15:00

Hmm. Redrawing subwindows when the command line is inactive seems particularly useless. Definitely worth checking.

Diego Gonzalez · August 1, 2019, 14:56

Originally posted by Pete Mack

Fair enough. Running, resting and digging near a vault. They all share the same issue: multiple turns in a single command. Running is the one most used, and digging the one most likely to be slowed by monsters.

The amount of visible subwindows (monsters, objetcs) also affects the performance of running. Sometimes too many redraws is a problem. Perhaps this was solved long ago with an optimization. I don't remember.

Pete Mack · August 1, 2019, 14:44

Fair enough. Running, resting and digging near a vault. They all share the same issue: multiple turns in a single command. Running is the one most used, and digging the one most likely to be slowed by monsters.

Gwarl · August 1, 2019, 10:48

Originally posted by Pete Mack

Yeah, that is sometimes a problem, too. But it was the illumination code that used to have "show reduced light radius while running" performance optimization. And it is running--and only runnning--where performance matters.

Not quite true, resting near a pack of hounds that you can't see also take a while.

But what you're saying makes sense and is interesting.

PowerWyrm · August 1, 2019, 10:46

Originally posted by Kusunose

It's nice but I think something like this is nicer because you can keep using for loop, though bit wordy.

Code:

struct loc_iterator {
	struct loc cur;
	struct loc begin;
	struct loc end;
};

struct loc_iterator loc_iterator(struct loc begin, struct loc end)
{
	struct loc_iterator iter;
	iter.cur = iter.begin = begin;
	iter.end = end;
	return iter;
}

bool loc_iterator_test(const struct loc_iterator* iter)
{
	return iter->cur.y != iter->end.y;
}

void loc_iterator_next(struct loc_iterator* iter)
{
	iter->cur.x++;
	if (iter->cur.x == iter->end.x) {
		iter->cur.x = iter->begin.x;
		iter->cur.y++;
	}
}

This code

Code:

	for (y = y1; y < y2; y++) {
		for (x = x1; x < x2; x++) {

can be translated as

Code:

	for (struct loc_iterator iter = loc_iterator(loc(x1, y1), loc(x2, y2));
		loc_iterator_test(&iter); loc_iterator_next(&iter)) {
			/* use iter.cur */
			int x = iter.cur.x;
			int y = iter.cur.y;

and this

Code:

	for (y = y1 - 1; y < y2 + 1; y++) {
		for (x = x1 - 1; x < x2 + 1; x++) {

can be translated as

Code:

	for (struct loc_iterator iter = loc_iterator(loc(x1 - 1, y1 - 1), loc(x2 + 1, y2 + 1));
		loc_iterator_test(&iter); loc_iterator_next(&iter)) {

Values for initializers and conditions can be directly translated, provided conditions are exclusive (if inclusive, add +1 for x and y for loc end).

It is indeed much nicer. I've updated my iterator to reflect on this, keeping pointers instead of structures because I don't like passing structs to functions.

Pete Mack · August 1, 2019, 02:35

Yeah, that is sometimes a problem, too. But it was the illumination code that used to have "show reduced light radius while running" performance optimization. And it is running--and only runnning--where performance matters.

Diego Gonzalez · August 1, 2019, 02:11

Originally posted by Pete Mack

In any case, there is only a single place in the code where performance matters: in determining visible monsters, walls, and objects. No other loop matters.

You are right. Early optimization was always a hidden enemy in programming...

I remember that noise and smell tracking was a big bottleneck in NPP. Vanilla use those?

Derakon · August 1, 2019, 00:25

Originally posted by Pete Mack

Kusunose--
Notice that the macro definition requires no subroutine calls at all. It is just a transliteration of existing code idiom into a single location. So there should be no performance hit at all.

It wouldn't surprise me if modern compilers are capable of saying "Ah, you're invoking this simple function here, we can just inline it" and get the same performance as a macro. In any case, the performance hit of invoking a function is miniscule in this day and age.

Pete Mack · July 31, 2019, 23:27

Gwarl--
Custom loop iterator functions don't make a lot of sense in C. They are great in C# and other languages with anonymous and/or lambda functions.

Originally posted by Gwarl

For my part the nested for loops are intuitive and I know what they do but the custom iterators look unreadable

Pete Mack · July 31, 2019, 22:31

Kusunose--
Notice that the macro definition requires no subroutine calls at all. It is just a transliteration of existing code idiom into a single location. So there should be no performance hit at all. In any case, there is only a single place in the code where performance matters: in determining visible monsters, walls, and objects. No other loop matters.

Diego--
Sigh. Really fixed now.

Gwarl · July 31, 2019, 22:05

For my part the nested for loops are intuitive and I know what they do but the custom iterators look unreadable

Diego Gonzalez · July 31, 2019, 20:46

Originally posted by Pete Mack

Thanks Diego. Fixed the loop in original. And yeah, macros kind of suck as an alternate to more modern techniques. But in C, you get no choice.

p.y = p0.y

Kusunose · July 31, 2019, 20:16

Originally posted by Pete Mack

Here ya go

Code:

# define loc_iterate(p0, p1, p) \
      for(p = p0; p.x < p1.x; p.x++) \
             for(p.y = 0; p.y < p1.y; p.y++)


.....
       point p0 ,p1, p;
       loc_iterate(p0, p1, p)
           foo(p);

Edit: fixed for correctness, thanks Diego

It's nice and short.

My version of loc_iterator can be also behind the same macro
interface. Though you have to refer to the current loc as "iter.cur"
rather than just "iter".

Code:

#define loc_iterate(begin, end, iter)	\
	for (struct loc_iterator iter = loc_iterator(begin, end); \
		 loc_iterator_test(&iter); loc_iterator_next(&iter))
...
	struct loc p1, p2;
	loc_iterate(p1, p2, iter) {
		foo(iter.cur);
	}
or
	loc_iterate(loc(x1, y1), loc(x2, y2), iter) {
		foo(iter.cur);
	}

An advantage of my version is that "begin" and "end" are
evaluated only once so using temporaries returned from
loc() is not a performance hit. Though you can use verbose
compound literals within parenthesis instead.

Code:

	loc_iterate(((struct loc) { x1, y1 }), ((struct loc) { x2, y2 }), iter) {
		foo(iter.cur);
	}

Pete Mack · July 31, 2019, 15:58

Thanks Diego. Fixed the loop in original. And yeah, macros kind of suck as an alternate to more modern techniques. But in C, you get no choice.

Preparing for 4.2 release

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: